You get a bonus - 1 coin for daily activity. Now you have 1 coin

18.2. Syntax query optimization

Lecture



18.2. Syntax query optimization

In the classical approach to the organization of query optimizers, at the stage of logical optimization, equivalent transformations of the internal representation of the query are made, which "improve" the initial internal representation in accordance with fixed optimizer strategies. The nature of the "improvements" is associated with the specifics of the overall organization of the optimizer, in particular, with the peculiarities of the search procedure for possible procedural query plans, performed in the third phase of query processing.

Therefore, it is difficult to give a complete description and classification of logical optimization methods. We will confine ourselves to several examples, and also consider one particular, but important class of logical transformations related to complex queries expressed in the SQL language.

18.2.1. Simple logical query transformations

The obvious class of logical transformations of the query is constituted by the transformations of the predicates included in the selection condition to the canonical representation. This refers to predicates containing a comparison of simple values. Such a predicate has the form "arithmetic expression op arithmetic expression", where "op" is a comparison operation, and the arithmetic expressions of the left and right sides in the general case contain the names of relationship fields and constants (in the SQL language, the variables of the ambient program whose values ​​become known only during the actual execution of the request).

Canonical representations can be different for different types of predicates. If the predicate includes only one field name, then its canonical representation may, for example, have the form "field name op constant arithmetic expression" (this form of the predicate — a simple predicate of selection — is very useful for the next stage of optimization). If the initial representation of the predicate is (a + 5) (A op 36 (the letters of the enclosing variables are indicated in small letters, and the names of the relationship fields are in large letters), then the canonical representation of such a predicate may be A op 36 / (a ​​+ 5)).

If the predicate includes exactly two field names of different relations (or two different occurrences of the same relationship), then its canonical representation can have the form "field name op arithmetic expression", where the arithmetic expression in the right-hand side includes only constants and the second name of the field (this also the form useful for performing the next optimization step is a join predicate; the case of an equicompound is especially important, when op is equality). If in the initial representation the predicate has the form 5 (Aa (B op b, then its canonical representation is A op (b + a (B) / 5.

Finally, for a more general form of the predicates under consideration, it makes sense to reduce the predicate to a canonical representation of the form "arithmetic expression op constant arithmetic expression", where the expressions of the right and left sides are also reduced to the canonical representation; for example, brackets are fully expanded in expressions and lexicographical ordering is performed. In the future, you can search for general arithmetic expressions in different query predicates. This is justified because, when the query is executed, the calculation of arithmetic expressions will be performed while selecting each next tuple, i.e. potentially large number of times.

When casting predicates to the canonical representation, it is possible to calculate constant expressions and get rid of logical negations.

The following class of logical transformations is associated with the reduction to the canonical form of a logical expression that specifies the condition of the query query. As a rule, either disjunctive or conjunctive normal forms are used. The choice of canonical form depends on the overall organization of the optimizer.

When bringing a logical condition to a canonical representation, one can search for common predicates (they can exist initially, can appear after reducing predicates to canonical form or in the process of normalizing logical conditions) and simplify the logical expression by, for example, identifying a conjunction of mutually contradictory predicates. A fragment of the logical expression 1/4 (A> 5) AND (A <5) 1/4 can be replaced by 1 / 4FALSE1 / 4. More clever simplifications are possible. For example, a fragment of a logical expression 1/4 (A> B) AND (B = 5) 1/4 can be replaced by 1/4 (A> 5) 1/4 Such simplifications may be significant for further processing of the request: in a query with a logical the condition of the first kind was supposed to be the union of relations; after conversion, the query no longer requires a connection.

18.2.2 Query Transformations with Reordering Relational Operations

In traditional optimizers, logical transformations associated with a change in the order of performing relational operations are common. An example of the corresponding transformation rule in terms of relational algebra is the following (A and B are the names of relations):

  (A JOIN B) WHERE restriction-on-A AND restriction-on-B 

equivalent to the expression

  A WHERE restriction-on-A) JOIN (B WHERE restriction-on-B). 

Here JOIN denotes the relational operator of the natural connection of relations; A WHERE restriction - the operator of restriction of relation A in accordance with the predicate restriction.

Although few relational systems have query languages ​​based purely on relational algebra, transformation rules for algebraic expressions may be useful in other cases. Quite often, relational algebra is used as the basis for the internal representation of a query. Naturally, after this, algebraic transformations can be carried out.

In particular, there are approaches associated with the conversion to the algebraic form of queries in the SQL language. Two main motivations for converting SQL queries to algebraic form can be identified. The first, in our opinion, less important reason may be the desire to use relational algebra as a unified internal interface of a relational DBMS. This approach is common when using specialized database machines, on the basis of which various database access interfaces are implemented. The database engine interface should be unified (for example, be algebraic), and all other interfaces, including the interface based on SQL, are reduced to algebraic.

The second reason, especially important in the context of optimization problems, is that relational algebra is simpler than the SQL language. Converting a query to an algebraic form simplifies the next steps for the optimizer to select optimal plans. Generally speaking, a developed SQL query-oriented query optimizer should reveal all possible execution plans for any query, but the search space for these plans is generally very large; Each specific optimizer uses its own heuristics to reduce the search space. Some, perhaps, the most optimal plans will never be considered. Reasonable conversion of a SQL query to an algebraic representation reduces the search space for query execution plans, ensuring that optimal plans are not lost.

18.2.3 Bringing queries with subqueries nested to queries with connections

The main difference between the SQL language and the language of relational algebra is the possibility of using predicates containing nested subqueries in the logical condition of the sample. The depth of nesting is not limited by language, that is, generally speaking, it can be arbitrary. Predicates with nested subqueries with common syntax can have very different semantics. The only common for all possible semantics of nested subqueries is that the query execution algorithm is to calculate the nested subquery each time the predicate value is calculated. Therefore, it is natural to strive for such a transformation of a query containing predicates with nested subqueries, which will make the semantics of the subquery more explicit, thereby allowing the optimizer to choose the method of query execution that most closely matches the semantics of the subquery.

Below, R i denotes the i-th relation of the database; C k - the k-th field (column) of the relationship.

Predicates valid in SQL queries can be divided into the following four groups:

  1. Simple predicates. These are predicates of the form Ri.Ck op X, where X is a constant or a list of constants, and op is a scalar comparison operator (=,! =,>,> =, <, <=) Or an operator for checking the occurrence in a set (IS IN, IS NOT IN).
  2. Predicates with nested subqueries. These are predicates of the form Ri.Ck op Q, where Q is a query block, and op can be the same as for simple predicates. The predicate may also have the form Q op Ri.Ck. In this case, the set membership operator is replaced with CONTAINS or DOES NOT CONTAIN. These two forms are symmetrical. It is enough to consider only one.
  3. Connection predicates These are predicates of the form Ri.Ck op Rj.Cn, where Ri! = Rj and op is a scalar comparison operator.
  4. Division predicates These are predicates of the form Qi op Qj, where Qi and Qj are blocks of queries, and op can be a scalar comparison operator or an operator for checking membership in a set.

The above classification is a simplification of the actual situation in SQL. Predicates of a compound of general form, including arithmetic expressions with fields of more than two relations, are not considered.

The canonical representation of a query on n relations is a query that contains an n-1 join predicate and does not contain predicates with nested subqueries. In fact, the canonical form is an algebraic representation of a query.

Below are two examples of canonical forms of queries with predicates of different types. The corresponding technique also exists for other types of predicates.

  SELECT Ri.Ck FROM Ri WHERE Ri.Ch IS IN
   SELECT Rj.Cm FROM Rj WHERE Ri.Cn = Rj.Cp 

is equivalent to

  SELECT Ri.Ck FROM Ri, Rj WHERE
   Ri.Ch = Rj.Cm AND Ri.Cn = Rj.Cp
 SELECT Ri.Ck FROM Ri WHERE Ri.Ch =
   SELECT AVG (Rj.Cm) FROM Rj WHERE Rj.Cn = Ri.Cp 

is equivalent to

  SELECT Ri.Ck FROM Ri, Rt WHERE
   Ri.Ch = Rt.Cm AND Ri.Cp = Rt.Cn
 - Rt (Cp, Cn) = SELECT Rj.Cp, AVG (Rj.Cn) FROM Rj
     GROUP BY Rj.Cp 

The reasonableness of such transformations is justified by the fact that the optimizer is able to choose a greater number of ways to execute queries. Often, post-conversion methods of execution are more efficient than the plans used in the traditional System R optimizer.

When using such an approach in the query optimizer, it is not necessary to make formal transformations of queries. The optimizer should mostly use the semantics of the processed request, and how it will be recognized is a technical issue.

Note that in the approach we briefly described there are some subtle semantic incorrectness. The corrected methods are known, but they are too complex technically to be considered in our lectures.


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Databases IBM System R - relational DBMS

Terms: Databases IBM System R - relational DBMS