Lecture
The extended Backus-Naur form ( Extended Backus-Nauru form ( RBNF )) (English Extended Backus – Naur Form ( EBNF )) is a formal system for defining syntax, in which some syntactic categories are sequentially defined through others. Used to describe context-free formal grammars. Proposed by Niklaus Wirth. It is an extended processing of Backus-Naur forms, differs from BNF in more “capacious” constructions, which, with the same expressive ability, simplify and reduce the description in volume.
However, many different variants of RBNF are used. The International Organization for Standardization adopted the RBNF standard: ISO / IEC 14977 (this article does not describe the syntax of the RBNF according to this standard, but it is done in the English version of this article).
As in the BNF, the description of a grammar in the RBNF is a set of rules that define the relationship between terminal symbols (terminals) and non-terminal symbols (non-terminals).
The rule in the RBNF is:
идентификатор = выражение.
where identifier is the name of a non-terminal symbol, and the expression is a combination of terminal and non-terminal symbols and special characters corresponding to the RBNF rules. The dot at the end is a special character indicating the end of the rule.
The semantics of the RBNF rule, a non-terminal symbol, given by an identifier to the left of the equal sign, is a combination of terminal and non-terminal symbols defined by the expression .
A complete grammar description is a set of rules that sequentially defines all non-terminal grammar symbols so that each non-terminal symbol can be reduced to a combination of terminal symbols by sequential (recursive) application of the rules. There are no special instructions in the definition of the RBNF regarding the order in which rules are written, although such prescriptions may be introduced when using RBNF using software that automatically generates parsing programs based on the grammar.
The set of possible designs of RBNF is very small. This is concatenation, choice, conditional entry and repetition.
A = BC.
indicates that nonterminal A consists of two characters - B and C. Concatenation elements are also called syntactic factors, or simply factors. In this example, B and C are syntactic factors. If concatenated characters can be denoted by more than one character (as is usually the case when describing programming languages), they are separated by one or more whitespace characters (spaces, line breaks, tabs). For example, Присваивание = Переменная ":=" Выражение.
- here the nonterminal Assignment is defined as the concatenation of the terms Variable , “: =” and Expression .A = B|C|D.
means that the nonterminal A can consist either of B, or of C, or of D. The elements of choice are also called syntactic terms, or simply terms. In this example, B, C, D are syntactic terms.A = [B].
indicates that the nonterminal A is either empty or consists of the symbol B.A = {B}.
denotes that A is either empty, or is the concatenation of any number of characters B (that is, A is either an empty element, or B, or BB, or BBB, and so on). If it is required that A represents either B or an arbitrary number B, but could not be empty, the record A = B{B}.
Digit = '0' | '1' | '2' | ... | '9'
Digit = '0' | '1' | '2' | ... | '9'
Digit = '0' | '1' | '2' | ... | '9'
. You can also define the concept of "symbol".Or all of the above in brief:
In some papers there are modified versions of the syntax of the RBNF.
::=
" instead of " =
" (by analogy with the BNF).Условный оператор = "IF", Логическое выражение, "THEN", Группа операторов, {"ELSIF", Логическое выражение, "THEN", Группа операторов}, ["ELSE", Группа операторов], "ENDIF"
Условный оператор = "IF", Логическое выражение, "THEN", Группа операторов, {"ELSIF", Логическое выражение, "THEN", Группа операторов}, ["ELSE", Группа операторов], "ENDIF"
- the rule that sets the grammar of the conditional operator of the Modula-2 language, where the “Conditional Operator” and “Group of Operators” are non-terminal characters with compound names.
The general form of the RBNF grammar can be described as a RBNF as follows:
Синтаксис = { СинтОператор }. СинтОператор = идентификатор "=" СинтВыражение ".". СинтВыражение = СинТерм {"|" СинТерм}. СинТерм = СинтФактор { СинтФактор }. СинтФактор = идентификатор | цепочка | "(" СинтВыражение ")" | "[" СинтВыражение "]" | "{" СинтВыражение "}".
Синтаксис = { СинтОператор }. СинтОператор = идентификатор "=" СинтВыражение ".". СинтВыражение = СинТерм {"|" СинТерм}. СинТерм = СинтФактор { СинтФактор }. СинтФактор = идентификатор | цепочка | "(" СинтВыражение ")" | "[" СинтВыражение "]" | "{" СинтВыражение "}".
In this description it is assumed that the identifier and the chain are predefined terms. If desired, it is not difficult to write down their definition in the RBNF, all that is required is to specify a certain alphabet and, if necessary, additional restrictions on the type of identifier.
The following grammars define a decimal notation of a general form (with a leading character, a possible fractional part and order) and a typical programming language identifier (a sequence of letters, numbers, and underscores, starting with a letter).
Число = ["+"|"-"]НатЧисло["."[НатЧисло]][("e"|"E")["+"|"-"]НатЧисло]. НатЧисло = Цифра{Цифра}. Цифра = "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9".
Число = ["+"|"-"]НатЧисло["."[НатЧисло]][("e"|"E")["+"|"-"]НатЧисло]. НатЧисло = Цифра{Цифра}. Цифра = "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9".
Идент = Буква{Буква|Цифра|"_"}.
Идент = Буква{Буква|Цифра|"_"}.
Definition of non-terminal The letter is not given here because of the obviousness and cumbersomeness - it is a choice from the accepted alphabet.
The similarities and differences between BNF and RBNF are apparent from the description. The difference is, by and large, in two main points:
::=
” is replaced by “ =
” and the use of angle brackets for the selection of non terminals is abolished. As a result, the ability to call nonterminals with verbose identifiers has disappeared, but the recording has become shorter. In the modification of the syntax of RBNF, in which the concatenation is indicated by a comma, verbose identifiers can be used.There can be different opinions about the success or failure of the first change, but, in any case, it does not affect the expressive possibilities of the form. But the second innovation is very significant. It also does not add fundamentally new expressive possibilities (everything that is written in the RBNF can be adequately recorded in the usual BNF), but it significantly reduces and simplifies recording.
The main advantage of RBNF over BNF is the ability to describe simple repeated structures of indefinite length (lists, strings, sequences, and so on) without recursive rules. The absence in the BNF of the construction of repetition leads to the fact that any repetition has to be determined by the introduction of additional intermediate non-terminal symbols and recursive rules, due to which the definition becomes excessively large in size and obscure. The description of repetitions in the RBNF is both shorter and more convenient for human perception.
As an example, consider the rules defining a nonterminal “list”, which is a set from zero to any number of identifiers separated by commas (it is assumed that the characters “Right Bracket”, “Left Bracket”, “Comma” and “Ident” are already defined).
The definition in the RBNF includes only one rule:
Список = ЛеваяСкобка [Идент {Запятая Идент}] ПраваяСкобка.
Список = ЛеваяСкобка [Идент {Запятая Идент}] ПраваяСкобка.
The definition in the BNF looks like this:
<Список> ::= <ЛеваяСкобка> <ПраваяСкобка> | <ЛеваяСкобка> <ИдентСпис> <ПраваяСкобка> <ИдентСпис> ::= <Идент> | <Идент> <Запятая> <ИдентСпис>
<Список> ::= <ЛеваяСкобка> <ПраваяСкобка> | <ЛеваяСкобка> <ИдентСпис> <ПраваяСкобка> <ИдентСпис> ::= <Идент> | <Идент> <Запятая> <ИдентСпис>
Already from this example, the differences in the forms are visible:
Naturally, the price paid for the advantages of RBNF over BNF is the greater complexity of automatic interpretation of RBNF descriptions. Generators of parsing programs using formal grammar descriptions using BNF are simpler than those used by RBNF.
RBNF are equivalent to a subclass of syntactic diagrams that are widely used to describe grammars. Any RBNF grammar can be adequately represented by a syntax diagram, but, in general, syntax diagrams allow you to create descriptions that cannot be represented in the RBNF (or, in any case, cannot be translated into the RBNF directly, without transforming the graphic description before).
RBNF, like its predecessor, BNF, is extremely widely used as a means of describing artificial languages, first of all, programming languages and related designation systems. In particular, the inventor of the RBNF, Niklaus Wirth, used this formalism in his books to describe all the programming languages considered there.
The higher complexity of the RBNF compared with the BNF leads to the fact that the generators of the programs of grammatical analysis, working on the basis of the RBNF, are significantly less than those based on the BNF. However, they exist and apply. RBNF is the basis for the Spirit C ++ Parser Framework, Coco / R, The SLK Parser Generator, as well as a number of others. For use in such systems, the RBNF syntax expands in the same direction as the BNF syntax when using yacc or bison parser-generators — the processing code is inserted directly into the grammar description, and the interaction with the lexical analyzer is organized in one way or another. Additional restrictions may be imposed on the structure of the rules - not all the rules that can be described on the RBNF can be effectively converted into code.
The unconditional advantages of RBNF include simplicity (the RBNF language itself contains only 10 special characters - three types of brackets, a vertical line, an equal sign and quotes, perhaps another comma; its syntax is determined by five rules), sufficient power and visibility, which makes it is convenient for performing descriptions, intended not only for automatic interpretation, but also for people to read. The proximity of RBNF constructions to syntax diagrams allows drawing the latter directly according to the RBNF description.
The disadvantages of the RBNF, as well as of the BPF, include the fact that they describe the grammatical structure of a formal language without taking into account contextual dependencies, which means that if such dependencies are present, the RBNF description turns out to be incomplete, and some syntax rules of the described language have to be stated in plain text form. This leads to the fact that the text that exactly corresponds to the RBNF grammar may, nevertheless, be syntactically incorrect. For example, in the RBNF grammar it is impossible to naturally reflect the fact that some operation requires operands of the same type. Similar checks have to be performed by hand-written grammar analyzer code. On the other hand, grammar description systems that include the definition of contextual dependencies, for example, the van Weingaarden grammar, turn out to be much more complicated, and their use for automatic generation of parsers is difficult.
International standard
Unofficial translation of the Standard
|
Comments
To leave a comment
Algorithmization and programming. Structural programming. C language
Terms: Algorithmization and programming. Structural programming. C language