Hello and welcome to another episode in the Godot basics tutorial series. In this episode we'll be taking a brief and quick look at the basics of strings and characters. Now strings are made up of multiple characters on top of that. Computers need character encoding to interpret raw zeros and ones into real characters readable characters. When it comes to character encoding there are two different types. The
The first type is ASC 2 to which it defines characters which are mapped to numbers zero through one hundred and twenty seven. In this case every character takes up one byte in memory. The second character encoding type is called unicode. As a matter of fact Unicode is the world's standard for text and emoji. It's difficult to pinpoint memory size when using Unicode because it's dependent on what's being used and what's being stored now.
Unicode is a variable with character encoding. This means that sizes change depending on the language supported as a matter of fact. Unicode can support almost every known language in the world. Different Unicode standards include UTF 8 UTF 16 UTF 32 and the list goes on when it comes to the English language. Most of the encoding is done in UTF 8. Speaking in regards to UTF Dash 8 you can think of Unicode as being separated into two categories.
The first category is control.
Codes also referred to as non principle codes. The second category is Latin script codes. Also referred to as printable codes. Let's
Let's go ahead and take a look at control codes.
Again as a refresher one byte is equal to 8 bits and control codes are reserved for non principal characters. This would include things like your backspace shift escape key and the list goes on the control codes will range from decimal values 0 through thirty one and one twenty seven through one fifty nine. The second category is Latin script codes and in Latin script which is readable. We are separated into basically four categories.
The first category is symbols and punctuation. These will be symbols such as your exclamation mark your pound sign the dollar symbol your percentage symbol. Basically all the symbols you would find in your English keyboard and this list also includes punctuation as well such as your comma and period your symbols and punctuation are spread out between the decimal values 32 through forty seven fifty eight through sixty four ninety one through ninety six and one twenty three through one twenty six.
The second category is numbers 0 through 9 and these characters will be assigned to the decimal values forty eight through fifty seven. The third category is your uppercase Latin characters a through z and these characters will be assigned to the decimal values sixty five through ninety. And of course you have your lower case Latin characters and your lower case characters will be assigned to the decimal values 97 through one twenty two.
Let's go ahead and take a look at what a character looks like in memory. So here we have our character in binary format 0 1 0 0 0 0 0 1. It's decimal value is sixty five and this binary represents the character of the capital a. Now if we were to flip the second to the last zero from zero to one we're gonna get 0 1 1 0 0 0 0 1 which has the decimal value of ninety seven and that value represents the character of the lower case A.
And so from here you can see how this looks like in binary format and what character encoding does when translating from binary into human readable texts. Let's go ahead and take a look at a string example as you can see here we have these string Hello World. Now if we were to break this down into its Unicode decimal value format we're gonna get the following H or rather capital H represents the decimal value seventy two. As a matter of fact if we keep going down you're going to see each letter whether capital or lowercase has a different decimal value attached to each individual character.
The L's are represented by the decimal value one away as you can see here. Exclamation point is represented with the decimal value thirty three. And notice here that we have a and even the space has a decimal value representation of thirty two and of course it also has its binary equivalent. Now if we were to take this string value hello world and we were to represent how much memory it takes whether we're using a to encoding or Unicode in this case Unicode UTF dash 8 we would get as we would have 12
Bytes being used in memory to store. This string 12 bytes because each character takes up one bite and we have 12 characters and that includes the space.
So let's count that one two three four five six seven eight nine ten eleven twelve. Now why exactly should you understand unicode strings and characters. Well for one thing unique coding characters will help you understand a little bit about memory in the context of text files and in a sense storing for example strings and variables. You should also understand unicode characters because it will help you understand the file class and the good dough API and we will talk a little bit more about that in the next episode.
I'm also going to put some articles on ASC 2 and Unicode in the description down below so please feel free to take a look at those.
Well thank you so much for joining me. Thank you for clicking the Like button and the subscribe button.
If you have any questions or comments please feel free to leave them in the comments section down below and I look forward to seeing you in the next episode.
Have an amazing day