In JavaScript, any textual data is a string. There is no separate type of “symbol” that exists in a number of other languages.
The internal format of strings, regardless of page encoding, is Unicode.
Creating strings
Strings are created using double or single quotes:
1 | var text = "моя строка" ; |
3 | var anotherText = 'еще строка' ; |
In JavaScript, there is no difference between double quotes and single quotes .
Special symbols
Strings may contain special characters. The most frequently used of these characters is the line break .
It is denoted as \n
, for example:
There are more rare characters, here is their list:
Special symbols Symbol | Description |
---|
\ b | Backspace |
\ f | Form feed |
\ n | New line |
\ r | Carriage return |
\ t | Tab |
\ unNNN | Unicode character with hexadecimal code NNNN . For example, \u00A9 - Unicode representation of the copyright symbol © |
Escaping special characters
If the string is in single quotes, internal inner quotes must be escaped , that is, provided with a backslash, like this:
var str = ' I\'m a JavaScript programmer' ; |
In double quotes - internal double quotes are escaped:
Escaping is intended solely for the correct perception of the JavaScript string. In memory, the string will contain the character itself without the '\'
. You can see this by running the example above.
The backslash character '\'
is a service one, therefore it is always escaped, that is, it is written as \\
:
You can shield any character. If it is not special, then nothing will happen:
Methods and properties
Here we look at the methods and properties of strings, some of which we met earlier in the chapter Methods and Properties.
Length length
One of the most frequent actions with a string is getting its length:
Character access
To get a character, use the charAt(позиция)
call. The first character has a position of 0
:
There is no separate “character” type in JavaScript, so charAt
returns a string consisting of the selected character.
In modern browsers (not IE7-), you can also use square brackets to access a symbol:
The difference between this method and charAt
is that if there is no character, the charAt
an empty string, and the parentheses are undefined
:
Note that str.length
is a property of the string, and str.charAt(pos)
is a method , i.e. function.
The call to the method always comes with brackets, and the property without brackets.
Row changes
Strings in JavaScript cannot be changed. You can read the symbol, but you cannot replace it. Once a string is created, it is forever.
To get around this, a new line is created and assigned to a variable instead of the old one:
Register change
The toLowerCase()
and toUpperCase()
methods change the case of a string to lower / upper:
The example below gets the first character and brings it to lowercase:
alert( "Интерфейс" .charAt(0).toLowerCase() ); |
Importance: 5
We cannot simply replace the first character, since JavaScript strings are immutable.
The only way is to re-create a line based on the existing one, but with a capital first character:
PS Other solutions are possible using the str.slice method and
str.replace.
[Open task in new window]
Substring search
To search for a substring, there is an indexOf method (a substring [, starting_position]).
It returns the position where the подстрока
is located, or -1
if nothing is found. For example:
The optional second argument allows you to search starting from the specified position. For example, the first time "id"
appears at position 1
. To find its next appearance, run the search from position 2
:
There is also a similar lastIndexOf method that searches not from the beginning, but from the end of the line.
For a nice call to indexOf
, the bitwise operator is NOT '~'
.
The fact is that the call ~n
equivalent to the expression -(n+1)
, for example:
As you can see, ~n
is zero only in the case when n == -1
.
That is, the if ( ~str.indexOf(...) )
check means that the indexOf
result is different from `-1, i.e. there is a coincidence.
Like this:
In general, using the capabilities of the language in an unobvious way is not recommended, since it degrades the readability of the code.
However, in this case, everything is in order. Just remember: '~'
is read as “not minus one”, but "if ~str.indexOf"
is read as "если найдено"
.
Importance: 5
The indexOf
method searches case-sensitive. That is, in the string 'xXx'
he will not find 'XXX'
.
For verification, we will result in lowercase and the string str
and what we will look for:
1 | function checkSpam(str) { |
2 | str = str.toLowerCase(); |
4 | return str.indexOf( 'viagra' ) >= 0 || str.indexOf( ) >= 0 || str.indexOf( 'xxx' ) >= 0; |
Complete solution: tutorial / intro / checkSpam.html.
[Open task in new window]
Search all occurrences
To find all occurrences of a substring, you need to run indexOf
in a loop. As soon as we get the next position, we start the next search with the next one.
An example of such a cycle:
Such a cycle starts the search from position 0
, then finding the substring at the position foundPos
, the next search will continue from position pos = foundPos+1
, and so on until it finds something.
However, the same algorithm can be written and shorter:
Substring capture: substr
, substring
, slice
.
In JavaScript, there are as many as 3 (!) Methods for taking a substring, with a few differences between them.
-
substring(start [, end])
- The
substring(start, end)
method returns the substring from the start
to position, but not including the end
. If the end
argument is omitted, then it goes to the end of the line:
-
substr(start [, length])
- The first argument has the same meaning as in
substring
, and the second contains not the final position, but the number of characters. If there is no second argument, it is implied “to the end of the line”.
-
slice(start [, end])
- Returns part of a string from the
start
position to, but not including, the end
position. The meaning of the parameters is the same as in substring
.
Negative arguments
The difference between substring
and slice
is how they work with negative and out-of-line arguments:
-
substring(start, end)
- Negative arguments are interpreted as equal to zero. Too large values are truncated to the length of the string:
In addition, if start > end
, then the arguments are reversed, i.e. returns the section of the line between start
and end
:
-
slice
- Negative values are counted from the end of the line:
This is much more convenient than the strange logic substring
.
The negative value of the first parameter is supported in substr
in all browsers except IE8-.
Findings.
The most convenient method is slice(start, end)
.
Alternatively, you can use substr(start, length)
, remembering that IE8 does not support negative start
.
Importance: 5
Since the final length of the string should be maxlength
, you need to cut it a little shorter to give room for the three-dot.
Another best option would be to use instead of the three points a special “ellipsis” symbol: …
( …
), then you can cut one character.
One could write this code even shorter:
[Open task in new window]
Unicode Encoding
If you are familiar with string comparisons in other languages, let me suggest one little riddle. Not even one, but two.
As we know, the characters are compared in alphabetical order 'А' < 'Б' < 'В' < ... < 'Я'
.
But there are a few oddities ..
- Why is the letter
'а'
small more than the letter 'Я'
big? - The letter
'ё'
is in the alphabet between е
and ж
: абвгде ё жз..
But why then 'ё'
more 'я'
?
To deal with this, let's turn to the internal representation of strings in javascript.
All strings are internally encoded Unicode.
It doesn't matter what language the page is written in, whether it is in windows-1251 or utf-8. Inside the JavaScript interpreter, all strings are reduced to a single “unicode” form. Each character has its own code.
There is a method for getting a character by its code:
- String.fromCharCode (code)
- Returns the character code
code
:
... And a method for obtaining a digital code from a symbol:
- str.charCodeAt (pos)
- Returns the character code at position
pos
. The countdown starts from zero.
Now back to the examples above. Why do comparisons of 'ё' > 'я'
and 'а' > 'Я'
give such a strange result?
The fact is that the characters are not compared alphabetically, but by code . Who has more code - one and more. There are many different characters in Unicode. Only a small part of them correspond to the Cyrillic letters, in more detail - Cyrillic in Unicode.
Let's output a segment of unicode characters with codes from 1034
to 1113
:
Result:
ЊЋЌЍЎЏАБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдежзийклмнопрстуфхцчшщъыьэюяѐёђѓєѕіїјљ
We can see from this section two important things:
- Lower case letters come after capital letters, so they are always larger.
In particular, 'а'(код 1072) > 'Я'(код 1071)
.
The same thing happens in the English alphabet, there is 'a' > 'Z'
. - A number of letters, such as
ё
, are outside the main alphabet.
In particular, the small letter ё
has a code that is larger than я
, therefore 'ё'(код 1105) > 'я'(код 1103)
.
By the way, the capital letter Ё
is located in Unicode to А
, therefore 'Ё'
(code 1025) < 'А'
(code 1040) . Amazing: there is a letter less than А
By the way, if we know the character code in Unicode, then we can add it to HTML using the “numeric character reference”.
To do this, first write &#
, then code, and terminate with a semicolon ';'
. For example, the character 'а'
in the form of a numeric link: а
.
If they want to give the code in hexadecimal notation, then start with &#x
.
There are many funny and useful characters in Unicode, for example, the scissors symbol: ✂ ( ✂
), fractions: ½ ( ½
) ¾ ( ¾
) and others. They can be conveniently used instead of pictures in the design.
String comparison
The strings are compared lexicographically , in the order of the “telephone directory”.
Comparison of strings s1
and s2
processed according to the following algorithm:
- The first characters are compared:
a = s1.charAt(0)
and b = s2.charAt(0)
. If they are the same, then the next step, otherwise, depending on the result of their comparison, return true
or false
- The second characters are compared, then the third, and so on ... If there are no more characters in one line, then it is smaller. If in both ended - they are equal.
The language specification defines this algorithm in more detail, but the meaning exactly corresponds to the order in which the names are entered into the telephone directory.
It happens that the numbers come to the script as strings, for example, as the result of the prompt
. In this case, the result of their comparison will be incorrect:
If at least one argument is not a string, the other will be converted to a number:
Total
- Strings in JavaScript are internally encoded Unicode. When writing a string, you can use special characters, for example,
\n
and insert Unicode characters by code. - We introduced the
length
property and the methods charAt
, toLowerCase/toUpperCase
, substring/substr/slice
( slice
preferred) - Strings are compared letter by letter. Therefore, if a number is received as a string, then such numbers may not be compared correctly, you need to convert it to the number type.
- When comparing strings it should be borne in mind that the letters are compared by their codes. Therefore, a capital letter is smaller than a small one, and the letter
ё
generally outside the main alphabet.
Creature
Arguments
string - Optional. Any group of Unicode characters.
Description, examples
String
objects, as a rule, are created implicitly using string literals.
var str = "string literal" |
In string literals, you can use escape sequences to represent special characters that cannot be directly used in strings, such as a newline character or Unicode characters. When the script is compiled, each escape sequence in the string literal is converted to the characters it represents.
You can specify a Unicode character explicitly through its code.
String
objects specified by quotes (and called "primitive" strings) are slightly different from String
objects created with the new operator. So, for example, the data type (typeof) of an object created with new
is 'object'
, not 'string'
. And such an object can directly assign additional properties and methods. As for the rest, the interpreter automatically turns primitive strings into objects.
Character access
Characters are accessed using the String # charAt method.
There is also a method missing in ECMA-262: addressing a string as an array:
In contrast to the languages C / PHP / etc., the once created string cannot be changed: the characters can only be read, but not changed.
To change a string variable, assign the modified string:
str = str.charAt(4) + str.charAt(5) + str.charAt(6) |
String comparison
For string comparison, the usual <> operators are used.
Methods
- split
- charCodeAt
- String.fromCharCode
- charAt
- concat
- lastIndexOf
- search
- match
- toLowerCase
- toUpperCase
- toLocaleLowerCase
- toLocaleUpperCase
- toString
- valueOf
- substring
- slice
- indexOf
- substr
- replace
Comments
To leave a comment
Scripting client side JavaScript, jqvery, BackBone
Terms: Scripting client side JavaScript, jqvery, BackBone