You get a bonus - 1 coin for daily activity. Now you have 1 coin

A simple program for parsing expressions on the example of the C language

Lecture



In the remainder of this chapter, two parsers are provided. The first one parses and evaluates only constant expressions, i.e. expressions with no variables. The second analyzer is able to work with 26 variables, from A to Z.

Below is the full version of a simple recursive downstream parser that evaluates expressions in which operands are represented in floating-point format when computed.

  / * This module contains a simple parser,
    which does not recognize variables.
 * /

 #include <stdlib.h>
 #include <ctype.h>
 #include <stdio.h>
 #include <string.h>

 #define DELIMITER 1
 #define VARIABLE 2
 #define NUMBER 3

 extern char * prog;  / * contains the expression being analyzed * /
 char token [80];
 char tok_type;

 void eval_exp (double * answer), eval_exp2 (double * answer);
 void eval_exp3 (double * answer), eval_exp4 (double * answer);
 void eval_exp5 (double * answer), eval_exp6 (double * answer);
 void atom (double * answer);
 void get_token (void), putback (void);
 void serror (int error);
 int isdelim (char c);

 / * Analyzer entry point.  * /
 void eval_exp (double * answer)
 {
   get_token ();
   if (! * token) {
     serror (2);
     return;
   }
   eval_exp2 (answer);

   if (* token) serror (0);  / * last lexeme must be zero * /
 }

 / * Addition or subtraction of two terms.  * /
 void eval_exp2 (double * answer)
 {
   register char op;
   double temp;

   eval_exp3 (answer);
   while ((op = * token) == '+' || op == '-') {
     get_token ();
     eval_exp3 (& temp);
     switch (op) {
       case '-':
         * answer = * answer - temp;
         break;
       case '+':
         * answer = * answer + temp;
         break;
     }
   }
 }

 / * Multiplication or division of two factors.  * /
 void eval_exp3 (double * answer)
 {
   register char op;
   double temp;

   eval_exp4 (answer);
   while ((op = * token) == '*' || op == '/' || op == '%') {
     get_token ();
     eval_exp4 (& temp);
     switch (op) {
       case '*':
         * answer = * answer * temp;
         break;
       case '/':
         if (temp == 0.0) {
           serror (3);  / * division by zero * /
           * answer = 0.0;
         } else * answer = * answer / temp;
         break;
       case '%':
         * answer = (int) * answer% (int) temp;
         break;
     }
   }
 }

 / * Exponentiation * /
 void eval_exp4 (double * answer)
 {
   double temp, ex;
   register int t;

   eval_exp5 (answer);

   if (* token == '^') {
     get_token ();
     eval_exp4 (& temp);
     ex = * answer;
     if (temp == 0.0) {
       * answer = 1.0;
       return;
     }
     for (t = temp-1; t> 0; --t) * answer = (* answer) * (double) ex;
   }
 }

 / * Multiplication of the unary operators + and -.  * /
 void eval_exp5 (double * answer)
 {
   register char op;

   op = 0;
   if ((tok_type == DELIMITER) && * token == '+' || * token == '-') {
     op = * token;
     get_token ();
   }
   eval_exp6 (answer);
   if (op == '-') * answer = - (* answer);
 }

 / * Evaluation of the expression in brackets.  * /
 void eval_exp6 (double * answer)
 {
   if ((* token == '(')) {
     get_token ();
     eval_exp2 (answer);
     if (* token! = ')')
       serror (1);
     get_token ();
   }
   else
     atom (answer);
 }

 / * Get the value in brackets.  * /
 void atom (double * answer)
 {
   if (tok_type == NUMBER) {
     * answer = atof (token);
     get_token ();
     return;
   }
   serror (0);  / * otherwise a syntax error in the expression * /
 }

 / * Expression of the token in the input stream.  * /
 void putback (void)
 {
   char * t;

   t = token;
   for (; * t; t ++) prog--;
 }

 / * Displays an error message.  * /
 void serror (int error)
 {
   static char * e [] = {
       "Syntax error",
       "Unbalanced brackets",
       "No Expression",
       "Division by zero"
   };
   printf ("% s \ n", e [error]);
 }

 / * Return another lexeme.  * /
 void get_token (void)
 {
   register char * temp;

   tok_type = 0;
   temp = token;
   * temp = '\ 0';

   if (! * prog) return;  / * end of expression * /
   while (isspace (* prog)) ++ prog;  / * skip spaces,
                   tab characters and blank lines * /

   if (strchr ("+ - * /% ^ = ()", * prog)) {
     tok_type = DELIMITER;
     / * go to the next character * /
     * temp ++ = * prog ++;
   }
   else if (isalpha (* prog)) {
     while (! isdelim (* prog)) * temp ++ = * prog ++;
     tok_type = VARIABLE;
   }
   else if (isdigit (* prog)) {
     while (! isdelim (* prog)) * temp ++ = * prog ++;
     tok_type = NUMBER;
   }

   * temp = '\ 0';
 }

 / * Return value is TRUE if c is a delimiter.  * /
 int isdelim (char c)
 {

   if (strchr ("+ - / *% ^ = ()", c) || ​​c == 9 || c == '\ r' || c == 0)
     return 1;
   return 0;
 }

In the form shown here, the analyzer supports the following operators: +, -, *, /,%. In addition, he is able to erect to an integer power (^) and calculate the unary minus. And the analyzer is able to correctly recognize the brackets. Note that it consists of six levels, as well as the atom function, which returns the value of a number. As already discussed earlier, in the global variable token , the next token is returned from the string containing the expression, and in tok_type is the type of the token. The prog pointer variable points to the string containing the expression.

The following simple main () function demonstrates the use of this parser:

  / * Demonstration program for the analyzer.  * /
 #include <stdlib.h>
 #include <ctype.h>
 #include <stdio.h>
 #include <string.h>

 char * prog;
 void eval_exp (double * answer);

 int main (void)
 {
   double answer;
   char * p;

   p = (char *) malloc (100);
   if (! p) {
     printf ("Error allocating memory. \ n");
     exit (1);
   }

   / * Handle expressions before entering a blank line.  * /
   do {
     prog = p;
     printf ("Type expression:");
     gets (prog);
     if (! * prog) break;
     eval_exp (& answer);
     printf ("Result:% .2f \ n", answer);
   } while (* p);

   return 0;
 }

To understand how the analyzer actually calculates an expression, let's work through the following example. (Suppose prog points to the beginning of an expression.)

  10 - 3 * 2 

When the eval_exp () function is called - the analyzer input point - a token is selected from the input string. If it is an empty string, the function prints the message "No Expression" and ends. However, in this case, the lexeme is the number 10. Since this is not an empty string, the function eval_exp2 () is called. As a result, the eval_exp2 () function calls eval_exp3 () , and eval_exp3 () calls eval_exp4 () , and that in turn calls eval_exp5 () . Then, the eval_exp5 () function checks if the lexeme is a unary plus or minus. In this case, this is not the case; therefore, the eval_exp6 () function is called . At this point, eval_exp6 () can recursively call either eval_exp2 () (in the case of an expression enclosed in parentheses) or atom () to determine the value of a number. Since the lexeme is not an opening bracket, the atom () function is executed and the variable * answer is assigned the value 10. Then, the next lexeme is sampled and returned from the function call chain. The token becomes the - operator, and control is returned to the eval_exp2 () function.

What happens next is very important. Since the current token is the symbol -, it is stored in the variable op . Then the analyzer selects the next lexeme and the descent along the chain begins again. As before, the atom () function is called. The resulting value of 3 is returned in the * answer variable, and the token * is read. This causes the chain to return to eval_exp3 () , where the last lexeme 2 is read. At this point, the first arithmetic operation occurs - 3 by 2. The result is returned to the eval_exp2 () function, where the subtraction occurs. As a result of the subtraction, the answer is 4. Despite the fact that this process may initially seem complicated, working out other examples on your own will help you understand the work of the analyzer.

This analyzer would be suitable for a desktop calculator, as demonstrated by the previous program, or for a small database. However, before using it to parse a programming language or in a complex calculator, it is necessary to add tools for working with variables into it. This is the subject of the next section.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Algorithms

Terms: Algorithms