21.4. Rule-based query optimization

Lecture

In Lecture 18, we briefly reviewed the query optimization problems that have to be solved in database language compilers. Perhaps the main conclusion that should be made on the basis of the materials in this lecture is that the query optimizer is the most cumbersome, complex and critical component of the DBMS. All database management system developers agree that you cannot save on query optimization. The more options for performing a query are analyzed and the more accurate estimates of the cost of the query execution plan are applied, the more likely it is that the query will be executed effectively.

The main trouble with query optimizers is that there is no accepted technology for their programming. Usually the optimizer is an amorphous set of relatively independent procedures that are rigidly connected with other components of the compiler. For this reason, it is very difficult to change optimization strategies or qualitatively expand them (this has to be done, since optimization in general and query optimization, in particular, is in principle an empirical discipline, and good empirical algorithms appear only with time).

How can this problem be solved? There are trade-off solutions that do not extend beyond the traditional production technology of compilers. Basically, they are all associated with the use of various tools that automate the construction of compilers. Among them, we note the technology used by Richard Stallman in his gcc family of compilers, as well as the Cocktail toolkit developed at the German University of Karlsruhe. The main production advantage of gcc is the use of a single language as a means of internal presentation of the program. The high-level lis-like language RTL is used at all gcc compilation phases, which makes it possible to use the same transformative procedures at different stages of program optimization (up to the stage of machine-dependent optimizations).

The Cocktail package provides a set of universal, customizable procedures for converting graphs of the internal representation of the program. In a sense, Cocktail can be considered as a specialized language for writing compilers (compilers of any languages, not just procedural programming languages or declarative database languages). As stated, Cocktail allows to increase the productivity of compiler developers by 2-3 times.

However, the most revolutionary approach among those known to the author was applied in the IBM Starburst experimental post-relational system. In a sense, this approach is a development of the Stallman idea, applied in the implementation of the widely popular Emacs editor. Recall that the basis of this editor is the interpreter of the extended dialect of Common Lisp. This interpreter itself is written in C, and the main part of the editor is written in Lisp. This allows, among other things, adding new features to the editor without leaving its environment: you simply write a new text on Lisp and declare the corresponding function connected to the editor.

The Starburst system is based on a production system. This system is essentially a virtual machine in which all DBMS components are executed, starting from the database language compiler (an extended version of the SQL language) and ending with the subsystem of direct query execution. The DBMS itself is a set of production rules, each of which is called by the production system when a corresponding event occurs and performs some action that, in turn, can lead to the occurrence of an event activating another rule. The rules are presented in a special language. A set of predefined low-level rules is supported that provide an interface with an external memory management subsystem (of course, for reasons of efficiency, this subsystem is not written in a production language).

Obviously, such an organization of the system provides maximum flexibility. For example, to introduce some new execution strategy (for example, to expand the used set of methods to perform equi-join) in the query optimizer, it is enough to additionally write one or more new rules related to the event of the request to perform a connection. Thus, Starburst can be used (and is really used in IBM research laboratories) as a powerful and flexible tool for researching query optimization methods. Of course, it is doubtful that the technology underlying Starburst will allow this system to compete with traditional commercial DBMS such as DB2, Oracle, Informix, etc.

Comments

To leave a comment

If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.

To reply

Comment

To confirm that you are not a bot, answer:

Name

Email(not published)

Vote

21.4. Rule-based query optimization

Comments

To leave a comment

Databases IBM System R - relational DBMS

Terms: Databases IBM System R - relational DBMS