Lecture
йMySQL DBMS has various tools for searching, among which is the LIKE operator, which performs the simplest search operations, the RLIKE operator, which provides the ability to search by regular expressions.
Regular expressions are special patterns for searching for substrings in text. With their help, you can solve such tasks with one line: “check whether the line contains numbers”, “find all email addresses in the text”, “replace several consecutive question marks with one”.
select name, family from personal where name like 'A%В_'
//Этим скриптом мы выберем все записи поле name у которых
//начинается с буквы А и предпоследняя буква этого поля В.
The statement can be used with negation: NOT LIKE .
In this case, the sample will get records that do not meet the specified conditions.
RLIKE operator
The RLIKE operator (as well as a synonym for REGEXP ) searches in accordance with regular expressions, which allows you to specify more flexible search conditions, however, this operator is slower than the LIKE operator.
There are several regular expression dialects. In MySQL, the implementation of the database is oriented towards compliance with the POSIX standard.
A regular expression is a pattern applied to a given text from left to right. For example, a regular expression containing plain text, such as 'monitor', matches a string containing such a substring, for example: 'monitoring', 'monitors', 'big monitors', etc.
Example:
select NAME from market where NAME RLIKE 'кекс'
Результат работы:
КЕКС К ЧАЮ В АССОРТ. 225ГР.
СУХАЯ СМЕСЬ ДЛЯ КЕКСА АТЛАНТА
МИНИ КЕКСЫ 6ШТ
1) select NAME from market_cards where NAME RLIKE '^кекс'
2) select NAME from market_cards where NAME RLIKE 'кекс$'
3) select NAME from market_cards where NAME RLIKE '^кекс$'
Результат работы:
Первый запрос вернйт нам записи типа:
КЕКСЫ МИНИ КОВИС 200ГР.
КЕКС К ЧАЮ ХАРРИС 225ГР.
КЕКСЫ 7 ДНЕЙ МАГДАЛЕН.
select NAME from market where NAME RLIKE '[[:<:]]медведь[[:>:]]'
Результат:
ИГРУШКА МЕДВЕДЬ МИТЕНЬКА РЮКЗАК
ИГРУШКА МЕДВЕДЬ МИТЬКА
МЕДВЕДЬ КОРИЧНЕВЫЙ
ИГРУШКА МЕДВЕДЬ МИТЬКА
Another special character:
'|' - It is similar in meaning (or), for example, 'mom | mother' - all lines including both 'mom' and 'mother' will be selected.
In the regular expression it is possible to use other special characters and character classes:
[abc] - records with any of the characters a, b, c will be selected.
[0-9] - any of the numbers. (similar in meaning [0123456789]).
[az] - any of the letters from a to z.
[а-я0-9] - any letter of the Russian alphabet or a number.
[^ 0-9] - means any value other than a digit. (in such cases, '^' is a kind of denial).
To define special sequences within strings:
'\ t' is a tab character.
'\ f' is the end of the file.
'\ n' is a newline character.
'\ r' is a carriage return character.
'\\' is a backslash \ character.
In addition, there are classes of POSIX regular expression characters:
[: alnum:] - alphanumeric characters.
[: alpha:] - alphabet characters.
[: blank:] - space and tab characters.
[: cntrl:] - control characters.
[: digit:] - decimal digits (0-9).
[: graph:] - graphic (visible) characters.
[: lower:] - lower case alphabet characters.
[: print:] - graphic or invisible characters.
[: punct:] - punctuation marks.
[: space:] - space, tab, newline, or carriage return characters.
[: upper:] - uppercase alphabet characters.
[: xdigit:] - hexadecimal digits.
(Alphabetic characters - can be both Russian and English.)
Expressions in square brackets correspond to only one character and are often used with quantifiers that immediately follow the character and change the number of its occurrences in the string.
? - the character either enters the string once, or does not enter it at all.
* - any number of occurrences of a character in a string, including zero.
+ - one or more occurrences of a character in a string.
The operator RLIKE can be used with the negation NOT RLIKE - in this case, the result of his work will be a selection of rows that do not match the specified parameters.
SELECT * FROM Table WHERE title REGEXP "dog|cat|mouse";
this design will find all or
not very good decision through concatenation and:
SELECT * FROM Table WHERE title REGEXP "dog" AND title REGEXP "cat" AND title REGEXP "mouse"
The regular expression will look like this:
SELECT * FROM Table WHERE title REGEXP "(dog.*cat.*mouse)|(dog.*mouse.*cat)|(mouse.*dog.*cat)
|(mouse.*cat.*dog)|(cat.*dog.*mouse)|(cat.*mouse.*dog)"
|
|
[.characters.]
the expression in brackets (written using [and]) corresponds to an ordered sequence of characters. Characters are either a single character or the symbolic name of a newline type. The following table lists the valid symbol names.
The following table lists valid symbol names and their corresponding characters. For characters given as numeric values, the values are represented in octal.
Name | Character | Name | Character |
---|---|---|---|
NUL |
0 |
SOH |
001 |
STX |
002 |
ETX |
003 |
EOT |
004 |
ENQ |
005 |
ACK |
006 |
BEL |
007 |
alert |
007 |
BS |
010 |
backspace |
'\b' |
HT |
011 |
tab |
'\t' |
LF |
012 |
newline |
'\n' |
VT |
013 |
vertical-tab |
'\v' |
FF |
014 |
form-feed |
'\f' |
CR |
015 |
carriage-return |
'\r' |
SO |
016 |
SI |
017 |
DLE |
020 |
DC1 |
021 |
DC2 |
022 |
DC3 |
023 |
DC4 |
024 |
NAK |
025 |
SYN |
026 |
ETB |
027 |
CAN |
030 |
EM |
031 |
SUB |
032 |
ESC |
033 |
IS4 |
034 |
FS |
034 |
IS3 |
035 |
GS |
035 |
IS2 |
036 |
RS |
036 |
IS1 |
037 |
US |
037 |
space |
' ' |
exclamation-mark |
'!' |
quotation-mark |
'"' |
number-sign |
'#' |
dollar-sign |
'$' |
percent-sign |
'%' |
ampersand |
'&' |
apostrophe |
'\'' |
left-parenthesis |
'(' |
right-parenthesis |
')' |
asterisk |
'*' |
plus-sign |
'+' |
comma |
',' |
hyphen |
'-' |
hyphen-minus |
'-' |
period |
'.' |
full-stop |
'.' |
slash |
'/' |
solidus |
'/' |
zero |
'0' |
one |
'1' |
two |
'2' |
three |
'3' |
four |
'4' |
five |
'5' |
six |
'6' |
seven |
'7' |
eight |
'8' |
nine |
'9' |
colon |
':' |
semicolon |
';' |
less-than-sign |
'<' |
equals-sign |
'=' |
greater-than-sign |
'>' |
question-mark |
'?' |
commercial-at |
'@' |
left-square-bracket |
'[' |
backslash |
'\\' |
reverse-solidus |
'\\' |
right-square-bracket |
']' |
circumflex |
'^' |
circumflex-accent |
'^' |
underscore |
'_' |
low-line |
'_' |
grave-accent |
'`' |
left-brace |
'{' |
left-curly-bracket |
'{' |
vertical-line |
'|' |
right-brace |
'}' |
right-curly-bracket |
'}' |
tilde |
'~' |
DEL |
177 |
SELECT '~' REGEXP '[[.~.]]';
-> 1SELECT '~' REGEXP '[[.tilde.]]';
-> 1
напрмер чтобы найти программист "C++" можно использовать
select * from `job` where LCASE (title RLIKE" [[: <:]] c [. +.] [. +.
[=character_class=]
In the expression in brackets (written using [and]), [= character_class =] represents an equivalence class. It matches all characters with the same sort value, including itself. For example, if o and (+) are members of an equivalence class, all [[= o =]], [[= (+) =]] and [o (+)] are synonymous. An equivalence class cannot be used as the end point of a range.
[:character_class:]
In a bracket expression (written using [and]), [: character_class:] represents a class of characters that matches all characters belonging to this class. The following table lists the standard class names. These names denote character classes defined on the ctype (3) manual page. A specific language may contain other class names. A character class cannot be used as the end point of a range.
Character Class Name | Meaning |
---|---|
alnum |
Alphanumeric characters |
alpha |
Alphabetic characters |
blank |
Whitespace characters |
cntrl |
Control characters |
digit |
Digit characters |
graph |
Graphic characters |
lower |
Lowercase alphabetic characters |
print |
Graphic or space characters |
punct |
Punctuation characters |
space |
Space, tab, newline, and carriage return |
upper |
Uppercase alphabetic characters |
xdigit |
Hexadecimal digit characters |
Character Class Name | Meaning |
---|
SELECT 'intellect' REGEXP '[[:alnum:]]+';
-> 1SELECT '!!' REGEXP '[[:alnum:]]+';
-> 0
[[:<:]]
, [[:>:]]
These markers represent word boundaries. They correspond to the beginning and end of words, respectively. A word is a sequence of symbols of a word that is not preceded or accompanied by symbols of words. A word character is an alphanumeric character in the alnum class or an underscore character (_).
SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';
-> 1SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';
-> 0
если делать поиск по словам в mysql 8.0 то ошибка почему?
FROM feedback
WHERE title RLIKE ' [[:<:]] iолрл'
Error Code: 3685. Illegal argument to a regular expression.
ICU для [[:<:]]начальная_часть_слова[[:>:]] нет(тоесть нет поддерки уникода). Для ICU вы можете использовать \ b или \\b для соответствия границ слов; удвоить обратную косую черту, потому что MySQL интерпретирует его как escape-символ в строках.В MySQL реализована поддержка регулярных выражений с использованием International Components for Unicode (ICU), которая обеспечивает полную поддержку Unicode и является многобайтовой безопасностью. (До MySQL 8.0.4 MySQL использовала реализацию регулярных выражений Генри Спенсера, которая работает побайтово и не является многобайтовой безопасностью.
используйте так REGEXP_LIKE(title, "\\biолрл" )
Comments
To leave a comment
MySql (Maria DB)
Terms: MySql (Maria DB)