15.2. Table expression

Lecture

The SQL / 89 standard recommends considering the calculation of a table expression as the sequential application of the FROM, WHERE, GROUP BY and HAVING clauses to the tables specified in the FROM list. The FROM clause has the following syntax:

  <from clause> :: =
    FROM <table reference>
  ({, <table reference>} ...]
 <table reference> :: =
    <table name> [<correlation name>]

15.2.1. FROM section

The result of the execution of the FROM clause is the extended Cartesian product of the tables defined by the list of tables in the FROM clause. The extended Cartesian product (extended, because multisets are allowed as operands and result) is defined in the standard as follows:

"The extended product R is the multiset of all rows r such that r is the concatenation of the rows from all identified tables in the order in which they are identified. The power R is the product of the powers of the identified tables. The ordinal number of the column in R is n + s, where n is the sequence number of the originating column in the named table T, and s the sum of the powers of all the tables identified to T in the FROM section. "

As you can see from the syntax, next to the table name you can specify another name "correlation name". In fact, this is some kind of synonym for the table name that can be used in other sections of a table expression to refer to the rows of this particular table entry.

If the table expression contains only the FROM section (this is the only required section of the table expression), then the result of the table expression matches the result of the FROM section.

15.2.2. Section WHERE

If there is a WHERE clause in the table expression, it is calculated next. The WHERE clause has the following syntax:

  <where clause> :: = WHERE <search condition>
 <search condition> :: =
    <boolean term>
  (<search condition> OR <boolean term>
 <Boolean term> :: =
    <boolean factor>
  (<boolean term> AND <boolean factor>
 <boolean factor> :: = [NOT] <boolean primary>
 <boolean primary> :: = <predicate> |  (<search condition>)

The WHERE clause is calculated according to the following rules: Let R be the result of the FROM clause. Then the search condition applies to all R rows, and the result of the WHERE clause is a table consisting of those R rows for which the result of the search condition is true. If the sampling condition includes subqueries, then each subquery is computed for each tuple of the table R (the standard uses the term effectively) in the sense that the result should be as if each subquery were really re-evaluated for each tuple R).

Note that since SQL / 89 allows for the presence of null values in the database, the search condition is calculated not in Boolean, but in three-valued logic with values true, false and unknown (unknown). For any predicate, it is known in which situations it can generate the value unknown. Boolean operations AND, OR and NOT work in three-valued logic as follows:

  true AND unknown = unknown
 unknown AND true = unknown
 unknown AND unknown = unknown
 true OR unknown = true
 unknown OR true = true
 unknown OR unknown = unknown
 NOT unknown = unknown

The following predicates can be found among predicates in the SQL / 89 search condition: comparison predicate, between predicate, predicate in, predicate like, predicate null, predicate with quantifier and predicate exists. Immediately, we note that in all SQL implementations, the efficiency of query execution is significantly affected by the presence of simple comparison predicates in the search condition (predicates that define the comparison of a table column with a constant). The presence of such predicates allows the DBMS to use indices when executing a query, i.e. avoid full table view. Although, in principle, the SQL language allows users not to worry about a specific set of predicates in the sample condition (as long as they are syntactically and semantically correct), when using SQL-oriented DBMS for real, such technical details should be kept in mind.

Comparison predicate

The syntax of the comparison predicate is determined by the following rules:

  <comparison predicate> :: =
    <value expression> <comp op>
   {<value expression> |  <subquery>}
 <comp op> :: =
    = |  <> |  <|  > | |  <= |  > =

Through "<>" denotes the operation "inequality". Arithmetic expressions of the left and right sides of the comparison predicate are constructed according to the general rules for constructing arithmetic expressions and can generally include the column names of tables from the FROM section and constants. The data types of arithmetic expressions must be comparable (for example, if the type of the column a of table A is the type of character strings, then the predicate "a = 5" is invalid).

If the right operand of the comparison operation is specified by a subquery, then an additional limitation is that the power of the result of the subquery must be no more than one. If at least one of the operands of the comparison operation has an undefined value, or if the right operand is a subquery with an empty result, then the value of the comparison predicate is unknown.

Note that the value of an arithmetic expression is not defined if at least one undefined value is involved in its calculation. Another important note from the SQL / 89 standard: in the context of GROUP BY, DISTINCT and ORDER BY, an undefined value acts as a special kind of a particular value, i.e. it is possible, for example, the formation of a group of rows whose value of the specified column is undefined. To ensure the portability of application programs, it is necessary to carefully evaluate the specifics of working with undefined values in a specific DBMS.

Between predicate

The predicate between has the following syntax:

  <between predicate> :: =
    <value expression>
    [NOT] BETWEEN <value expression> AND <value expression>

The result of "x BETWEEN y AND z" is the same as the result of "x> = y AND x <= z". The result of "x NOT BETWEEN y AND z" is the same as the result of "NOT (x BETWEEN y AND z)".

In predicate

The in predicate is defined by the following syntax rules:

  <in predicate> :: =
    <value expression> [NOT] IN
   {<subquery> |  (<in value list>)}
 <in value list> :: =
    <value specification>
  {, <value specification>} ...

The types of the left operand and the values from the list of the right operand (recall that the resulting subquery table must contain exactly one column) must be comparable.

The predicate value is true if and only if the value of the left operand matches at least one value of the list of the right operand. If the list of the right operand is empty (so maybe if the right operand is specified by a subquery), or the value of the "implied" comparison predicate x = y (where x is the value of the left operand arithmetic expression) is false for each element y of the right operand list, then the predicate value in is false. Otherwise, the value of the in predicate is unknown. By definition, the predicate value "x NOT IN S" is equal to the predicate value "NOT (x IN S)".

Like predicate

The like predicate has the following syntax:

  <like predicate> :: =
    <column specification> [NOT] LIKE <pattern>
    [ESCAPE <escape character>]
 <pattern> :: = <value specification>
    <escape character> :: = <value specification>

The data types of the left operand and sample column must be character string types. In the ESCAPE section, a single character must be specified.

The predicate value is true if pattern is a substring of the specified column. In this case, if the ESCAPE section is absent, then when matching the template with a string, a special interpretation of two characters of the template is made: the underscore character ("_") means any single character; the percent symbol ("%") denotes a sequence of arbitrary characters of arbitrary length (maybe zero).

If the ESCAPE section is present and specifies some single character x, then the pairs of characters "x_" and "x%" represent single characters "_" and "%", respectively.

The value of the like predicate is unknown if the value of the column or template is not defined.

The predicate value "x NOT LIKE y ESCAPE z" matches the value "NOT x LIKE y
ESCAPE z ".

Null predicate

The null predicate is described by a syntax rule:

  <null predicate> :: =
    <column specification> IS [NOT] NULL

This predicate is always true or false. In this case, the value "x IS NULL" is true if and only if the value x is not defined. The predicate value "x NOT IS NULL" is equal to the value "NOT x IS NULL".

Predicate with quantifier

The quantifier predicate has the following syntax:

  <quantified predicate> :: =
    <value expression> <comp op> <quantifier> <subquery>
 <quantifier> :: =
    <all> |  <some>
 <all> :: = ALL
 <some> :: = SOME |  ANY

Denote by x the result of the calculation of the arithmetic expression of the left side of the predicate, and by S the result of the calculation of the subquery.

The predicate "x <comp op> ALL S" is true if S is empty or the predicate value "x <comp op> s" is true for each s included in S. The predicate "x <comp op> ALL S" is false if the predicate value "x <comp op> s" is false for at least one s included in S. In other cases, the predicate value "x <comp op> ALL S" is equal to unknown.

The predicate "x <comp op> SOME S" is false if S is empty or the predicate value "x <comp op s" is false for each s included in S. The predicate "x <comp op SOME S" has a value true if the predicate value "x <comp op> s" is true for at least one s included in S. In other cases, the predicate value "x <comp op SOME S" is equal to unknown.

Exists predicate

The exists predicate has the following syntax:

  <exists predicate> :: =
    EXISTS <subquery>

The value of this predicate is always true or false, and this value is true if and only if the result of the subquery calculation is not empty.

15.2.3. GROUP BY section

If there is a GROUP BY clause in the table expression, then it is executed next. The syntax for the GROUP BY clause is as follows:

  <group by clause> :: =
    GROUP BY <column specification>
    [{, <column specification>} ...]

If we denote by R the table that is the result of the previous section (FROM or WHERE), then the result of the GROUP BY section is the partition of R into multiple groups of rows consisting of the minimum number of groups such that for each column from the list of columns of the GROUP BY section in all rows of each groups with more than one row, the values of this column are equal. To indicate the result of the GROUP BY clause, the term “grouped table” is used in the standard.

15.2.4. HAVING section

Finally, the HAVING clause is used last when calculating a table expression (if present). The syntax for this section is as follows:

  <having clause> :: =
    HAVING <search condition>

The HAVING section can meaningfully appear in a table expression only if it contains a GROUP BY section. The search condition of this section sets a condition for a group of rows of a grouped table. Formally, the HAVING section may be present in a table expression that does not contain GROUP BY. In this case, it is assumed that the result of the calculation of the previous sections is a grouped table consisting of one group without selected grouping columns.

The search condition of the HAVING clause is constructed according to the same syntax as the search condition of the WHERE clause, and may include the same predicates. However, there are special syntactic restrictions on the use of the table column specifications in the FROM section of this table expression in the search condition. These restrictions follow from the fact that the search condition of the HAVING section sets the condition on the whole group, and not on the individual lines.

Therefore, in the arithmetic expressions of the predicates included in the selection condition of the HAVING section, you can directly use only the specifications of the columns specified as grouping columns in the GROUP BY section. The remaining columns can be specified only within the specifications of the aggregate functions COUNT, SUM, AVG, MIN and MAX, which in this case compute an aggregate value for the entire group of rows. The situation is similar with the subqueries that are included in the predicates of the HAVING clause selection condition: if the characteristic of the current group is used in the subquery, then it can be specified only by reference to the grouping columns.

The result of executing the HAVING clause is a grouped table containing only those groups of rows for which the result of evaluating the search condition is true. In particular, if the HAVING section is present in a table expression that does not contain GROUP BY, then the result of its execution will be either an empty table or the result of the previous sections of the table expression, considered as one group without grouping columns.

Comments

To leave a comment

If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.

To reply

Comment

To confirm that you are not a bot, answer:

Name

Email(not published)

Vote