You get a bonus - 1 coin for daily activity. Now you have 1 coin

Regular expressions in mysql RLike REGEXP, operation or, and, escaping special characters

Lecture



йMySQL DBMS has various tools for searching, among which is the LIKE operator, which performs the simplest search operations, the RLIKE operator, which provides the ability to search by regular expressions.

regular expression review

Regular expressions are special patterns for searching for substrings in text. With their help, you can solve such tasks with one line: “check whether the line contains numbers”, “find all email addresses in the text”, “replace several consecutive question marks with one”.

Regular expressions in mysql RLike REGEXP, operation or, and, escaping special characters

Regular expressions in mysql RLike REGEXP, operation or, and, escaping special characters

Regular expressions in mysql RLike REGEXP, operation or, and, escaping special characters

LIKE operator


The LIKE operator is designed to compare strings using simplest regular expressions. The operator is often used in the WHERE clause and returns 1 or 0.

String comparisons are case insensitive unless the BINARY keyword is used, meaning that the string should be treated as a binary sequence.

Together with the operator LIKE you can use special characters:
" % " - Corresponds to any number of characters and their absence, too.
" _ " - Matches one character.
The search for the characters " % " and " _ " is carried out using the constructions " \% " and " \ _ ".

Example of use:
select name, family from personal where name like 'A%В_' 
 //Этим скриптом мы выберем все записи поле name у которых 
 //начинается с буквы А и предпоследняя буква этого поля В. 



The statement can be used with negation: NOT LIKE .
In this case, the sample will get records that do not meet the specified conditions.

RLIKE operator


The RLIKE operator (as well as a synonym for REGEXP ) searches in accordance with regular expressions, which allows you to specify more flexible search conditions, however, this operator is slower than the LIKE operator.

There are several regular expression dialects. In MySQL, the implementation of the database is oriented towards compliance with the POSIX standard.

A regular expression is a pattern applied to a given text from left to right. For example, a regular expression containing plain text, such as 'monitor', matches a string containing such a substring, for example: 'monitoring', 'monitors', 'big monitors', etc.

Example:

 select NAME from market where NAME RLIKE 'кекс' 
 
 Результат работы: 
 КЕКС К ЧАЮ В АССОРТ. 225ГР. 
 СУХАЯ СМЕСЬ ДЛЯ КЕКСА АТЛАНТА 
 МИНИ КЕКСЫ 6ШТ 

All records will be selected from the market_cards table where the NAME field contains the word 'cupcake', no matter where in the line this word (or more precisely, the combination of characters) is located.

To bind the search expression to the beginning of the line, or to the end, use the characters:
'^' - binds to the beginning of the line.
'$' - binds to the end of the line.

Example:
 1) select NAME from market_cards where NAME RLIKE '^кекс' 
 2) select NAME from market_cards where NAME RLIKE 'кекс$' 
 3) select NAME from market_cards where NAME RLIKE '^кекс$' 
 
 Результат работы: 
 Первый запрос вернйт нам записи типа: 
 КЕКСЫ МИНИ КОВИС 200ГР. 
 КЕКС К ЧАЮ ХАРРИС 225ГР. 
 КЕКСЫ 7 ДНЕЙ МАГДАЛЕН. 

The second and third requests will not return anything, because they will not find entries ending in the word 'cupcake'

More often, it is necessary to bind to the end or beginning of a line, and to the end or beginning of a word.
This task is implemented by the sequence:
[[: <:]] - The beginning of the word.
[[:>:]] - End of the word.

Example:
 select NAME from market where NAME RLIKE '[[:<:]]медведь[[:>:]]' 
 
 Результат: 
 ИГРУШКА МЕДВЕДЬ МИТЕНЬКА РЮКЗАК 
 ИГРУШКА МЕДВЕДЬ МИТЬКА 
 МЕДВЕДЬ КОРИЧНЕВЫЙ 
 ИГРУШКА МЕДВЕДЬ МИТЬКА 



Another special character:
'|' - It is similar in meaning (or), for example, 'mom | mother' - all lines including both 'mom' and 'mother' will be selected.

In the regular expression it is possible to use other special characters and character classes:
[abc] - records with any of the characters a, b, c will be selected.
[0-9] - any of the numbers. (similar in meaning [0123456789]).
[az] - any of the letters from a to z.
[а-я0-9] - any letter of the Russian alphabet or a number.
[^ 0-9] - means any value other than a digit. (in such cases, '^' is a kind of denial).

To define special sequences within strings:
'\ t' is a tab character.
'\ f' is the end of the file.
'\ n' is a newline character.
'\ r' is a carriage return character.
'\\' is a backslash \ character.

In addition, there are classes of POSIX regular expression characters:
[: alnum:] - alphanumeric characters.
[: alpha:] - alphabet characters.
[: blank:] - space and tab characters.
[: cntrl:] - control characters.
[: digit:] - decimal digits (0-9).
[: graph:] - graphic (visible) characters.
[: lower:] - lower case alphabet characters.
[: print:] - graphic or invisible characters.
[: punct:] - punctuation marks.
[: space:] - space, tab, newline, or carriage return characters.
[: upper:] - uppercase alphabet characters.
[: xdigit:] - hexadecimal digits.

(Alphabetic characters - can be both Russian and English.)

Expressions in square brackets correspond to only one character and are often used with quantifiers that immediately follow the character and change the number of its occurrences in the string.
? - the character either enters the string once, or does not enter it at all.
* - any number of occurrences of a character in a string, including zero.
+ - one or more occurrences of a character in a string.

The operator RLIKE can be used with the negation NOT RLIKE - in this case, the result of his work will be a selection of rows that do not match the specified parameters.

 SELECT * FROM Table WHERE title REGEXP "dog|cat|mouse"; 
 

this design will find all or

not very good decision through concatenation and:

 SELECT * FROM Table WHERE title REGEXP "dog" AND title REGEXP "cat" AND title REGEXP "mouse" 

The regular expression will look like this:

 SELECT * FROM Table WHERE title REGEXP "(dog.*cat.*mouse)|(dog.*mouse.*cat)|(mouse.*dog.*cat) 
 |(mouse.*cat.*dog)|(cat.*dog.*mouse)|(cat.*mouse.*dog)" 

Escaping and special characters and special classes in MySQL

  • [.characters.]

    the expression in brackets (written using [and]) corresponds to an ordered sequence of characters. Characters are either a single character or the symbolic name of a newline type. The following table lists the valid symbol names.

    The following table lists valid symbol names and their corresponding characters. For characters given as numeric values, the values ​​are represented in octal.

    Name Character Name Character
    NUL 0 SOH 001
    STX 002 ETX 003
    EOT 004 ENQ 005
    ACK 006 BEL 007
    alert 007 BS 010
    backspace '\b' HT 011
    tab '\t' LF 012
    newline '\n' VT 013
    vertical-tab '\v' FF 014
    form-feed '\f' CR 015
    carriage-return '\r' SO 016
    SI 017 DLE 020
    DC1 021 DC2 022
    DC3 023 DC4 024
    NAK 025 SYN 026
    ETB 027 CAN 030
    EM 031 SUB 032
    ESC 033 IS4 034
    FS 034 IS3 035
    GS 035 IS2 036
    RS 036 IS1 037
    US 037 space ' '
    exclamation-mark '!' quotation-mark '"'
    number-sign '#' dollar-sign '$'
    percent-sign '%' ampersand '&'
    apostrophe '\'' left-parenthesis '('
    right-parenthesis ')' asterisk '*'
    plus-sign '+' comma ','
    hyphen '-' hyphen-minus '-'
    period '.' full-stop '.'
    slash '/' solidus '/'
    zero '0' one '1'
    two '2' three '3'
    four '4' five '5'
    six '6' seven '7'
    eight '8' nine '9'
    colon ':' semicolon ';'
    less-than-sign '<' equals-sign '='
    greater-than-sign '>' question-mark '?'
    commercial-at '@' left-square-bracket '['
    backslash '\\' reverse-solidus '\\'
    right-square-bracket ']' circumflex '^'
    circumflex-accent '^' underscore '_'
    low-line '_' grave-accent '`'
    left-brace '{' left-curly-bracket '{'
    vertical-line '|' right-brace '}'
    right-curly-bracket '}' tilde '~'
    DEL 177
     SELECT '~' REGEXP '[[.~.]]';  -> 1
      SELECT '~' REGEXP '[[.tilde.]]';  -> 1 
  • напрмер чтобы найти программист "C++" можно использовать

  • select * from `job` where LCASE (title RLIKE" [[: <:]] c [. +.] [. +.

  • [=character_class=]

    In the expression in brackets (written using [and]), [= character_class =] represents an equivalence class. It matches all characters with the same sort value, including itself. For example, if o and (+) are members of an equivalence class, all [[= o =]], [[= (+) =]] and [o (+)] are synonymous. An equivalence class cannot be used as the end point of a range.

  • [:character_class:]

    In a bracket expression (written using [and]), [: character_class:] represents a class of characters that matches all characters belonging to this class. The following table lists the standard class names. These names denote character classes defined on the ctype (3) manual page. A specific language may contain other class names. A character class cannot be used as the end point of a range.

    Character Class Name Meaning
    alnum Alphanumeric characters
    alpha Alphabetic characters
    blank Whitespace characters
    cntrl Control characters
    digit Digit characters
    graph Graphic characters
    lower Lowercase alphabetic characters
    print Graphic or space characters
    punct Punctuation characters
    space Space, tab, newline, and carriage return
    upper Uppercase alphabetic characters
    xdigit Hexadecimal digit characters
    Character Class Name Meaning
      SELECT 'intellect' REGEXP '[[:alnum:]]+';  -> 1
      SELECT '!!' REGEXP '[[:alnum:]]+';  -> 0
    
  • [[:<:]] , [[:>:]]

    These markers represent word boundaries. They correspond to the beginning and end of words, respectively. A word is a sequence of symbols of a word that is not preceded or accompanied by symbols of words. A word character is an alphanumeric character in the alnum class or an underscore character (_).

     SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]';  -> 1
      SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]';  -> 0 

See also

avatar
2.8.2020 8:41

если делать поиск по словам в mysql 8.0 то ошибка почему?
FROM feedback
WHERE title RLIKE ' [[:<:]] iолрл'

Error Code: 3685. Illegal argument to a regular expression.

avatar
2.8.2020 8:46

ICU для [[:<:]]начальная_часть_слова[[:>:]] нет(тоесть нет поддерки уникода). Для ICU вы можете использовать \ b или \\b для соответствия границ слов; удвоить обратную косую черту, потому что MySQL интерпретирует его как escape-символ в строках.В MySQL реализована поддержка регулярных выражений с использованием International Components for Unicode (ICU), которая обеспечивает полную поддержку Unicode и является многобайтовой безопасностью. (До MySQL 8.0.4 MySQL использовала реализацию регулярных выражений Генри Спенсера, которая работает побайтово и не является многобайтовой безопасностью.

avatar
2.8.2020 8:51

используйте так REGEXP_LIKE(title, "\\biолрл" )


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Databases - MySql (Maria DB)

Terms: Databases - MySql (Maria DB)