Lecture
The need to maintain in an object-oriented DBMS not only the language (or family of languages) of the OOBD programming, but also the developed query language is now recognized by almost all developers. The system should support an easy-to-learn interface that is directly accessible to the end user online.
The most common approach to organizing interactive interfaces with object-oriented database systems is based on the use of crawlers. In this case, the final interface is usually graphical. The screen displays the diagram (or subcircuit) of the RDBC, and the user accesses the objects in the navigation style. Some researchers believe that in this case it is reasonable to ignore the principle of encapsulating objects and present the inside of objects to the user. In most existing OOBD systems, such an interface exists, but it is clear to everyone that the navigation query language is in a sense a step backward compared to the query languages of even relational systems. Active searches are underway for approaches to the organization of declarative query languages to OODB.
Beeri notes the existence of three approaches. The first approach is languages that are object-oriented extensions of the query language of relational systems. The most common languages with syntax close to the well-known language SQL. This is due, of course, to the general recognition and extremely widespread use of this language. In particular, in their Third Generation Manifesto, M. Stounbrecker and his colleagues on the committee of prospective database systems state the need to maintain an SQL-like interface in all of the next generation DBMS. We have already seen the impact of this point of view on the development of the SQL language.
The second approach is based on the construction of a complete logical object-oriented calculus. There are theoretical works on the construction of such a calculus, but we do not know the complete and practically implemented query language. Apparently, works based on algebraic category theory can also be attributed to the same direction of strictly theoretically grounded query languages.
Finally, the third approach is based on the deductive approach. This mainly reflects the desire of developers to bring together the directions of deductive and object-oriented databases.
Regardless of the approach used to develop a query language, developers are faced with one conceptual problem, the solution of which does not fit into the traditional direction of the object-oriented approach. It is clear that the basis for formulating a query should be a class that represents a set of similar objects in OODD. But what can be the result of a query? The set of basic concepts of an object-oriented approach does not contain a concept that is appropriate for this case. Usually they get out of position by expanding the basic set of concepts of the set of objects and assuming that the result of the query is a certain subset of objects-instances of the class. This is a rather restrictive approach, since it automatically excludes the possibility of the availability in the query language of means similar to the relational connection operator. Briefly consider the features of several specific declarative query languages to OODB.
In the query language of the object-oriented DBMS ORION, the principle of object encapsulation is fully supported. In the implemented version of the language, queries can be based only on one class (an approach was proposed to define a query on several classes in the style of extending the semantics of the relational operator of a connection). The syntax of the language is SQL oriented. The set of admissible selection predicates is very well developed. In particular, for an attribute whose domain is the superclass, you can specify the name of the user of interest to the subclass.
The query language of the Iris system is largely influenced by the relational paradigm. Even the name of this OSQL language reflects its close connection with the relational language SQL. In fact, OSQL is a relational language designed to work with unnormalized relationships. Naturally, with this approach, the encapsulation of objects is broken in OSQL.
In our opinion, the declarative query language of the O2 RELOOP system is of particular interest. In general terms, this is a declarative query language with a SQL-oriented syntax, based on an algebra of objects and values, specially developed for the O2 model. (By the way, this is not the only work in the direction of constructing algebra for object-oriented data models.) A particularly impressive quality of RELOOP is the naturalness of its construction in the general context of the O2 model. The query is always set on the value-set or list. If we remember that the long-term class in O2 corresponds to the same-value set, then we can thus define a query on any stored class. The result of a query can be an object, a set value, or a list value. In this case, the elements of the value-sets can be objects (simple sampling), or tuple values with elements-objects of different classes (for example). Together, these features of the language allow us to formulate queries on several classes (a specific connection that generates not new objects, but tuples from existing objects), as well as use nested subqueries.
As usual, the main goal of optimizing a query in the RHS system is to create an optimal plan for executing a request using the RODD access primitives.
Query optimization is well researched and developed in the context of relational databases. There are methods of syntactic and semantic optimization at the level of non-procedural representation of a query, algorithms for performing elementary relational operations, methods for estimating the value of query plans.
Of course, objects can have a significantly more complex structure than tuples of flat relations, but this distinction is not the most important. The main difficulty of optimizing queries to OODB follows from the fact that in this case the conditions of the sample are formulated in terms of "external" attributes of objects (methods), and for real optimization (i.e., to develop an optimal plan), the conditions defined on "internal" attributes (state variables).
In fact, a similar situation exists in the RDBMS when optimizing the query over the representation of the database. In this case, conditions are also formulated in terms of external attributes (representation attributes), and in order to optimize the query, these conditions must be converted to conditions defined on the attributes of the stored relations. A well-known method of such "pre-optimization" is the substitution of representations, which often (although not always in the case of using the SQL language) provides the required conversions. An alternative way to perform a query on a view (sometimes the only possible one) is to materialize the view.
In OOBD systems, the situation is significantly complicated by two circumstances. First, methods are usually programmed in some procedural programming language and may have parameters. Those. In the general case, the body of a method is not just an arithmetic expression, as in the case of defining presentation attributes, but a parameterized program that includes branches, function calls and methods of other objects. The second difficulty is related to late binding, which is common in OOP: the exact implementation of the method and even the structure of the object may not be known at the time of compiling the query.
One of the approaches to simplifying the problem is to open the visibility of some (most important for optimization) internal attributes of objects. In this context, it would be sufficient to open the visibility only for the query compiler, i.e. actually prohibit overriding such variables in subclasses. From the user's point of view, such attributes would look like methods without parameters, returning a value of the corresponding type. From our point of view, it would be better to retain strict encapsulation of objects (in order to save the application from critical implementation dependence) and to ensure the possibility of careful design of the OODD scheme taking into account the needs of query optimization.
A general approach to the pre-optimization of the sampling conditions for one (super) class of objects may be the following (we assume that the conditions are formulated using first-order predicate logic without quantifiers; the corresponding class methods, constants, and comparison operations can be used in predicates):
Step A: Convert the logical condition formula to conjunctive normal form (CNF). We do not dwell on the method of selecting a particular CNF, but naturally, a “good” CNF should be chosen (for example, containing the maximum number of atomic conjuncts).
Step B: For each conjunct that includes methods with a body known at compile time, replace the method calls with their bodies with the parameters substituted. (For simplicity, we will assume that the parameters do not contain calls to functions or methods of other objects.)
Step C: For each such conjunct, make all possible simplifications, i.e. calculate everything that can be calculated in statics. Although in general terms this task is very complex, with a reasonable design of OOBD, methods will have to include methods with an extremely simple implementation, setting conditions on which will be very natural. Such conditions will be simplified very effectively.
Step D: If there are now conjuncts that are simple comparison predicates based on state variables and constants, use these conjuncts to develop an optimal plan for query execution. If such conjuncts could not be obtained, the only way to “filter” the (super) class of objects is to view it consistently with a full calculation of the (possibly simplified) logical expression for each object.
It is clear that the possibilities of optimization will depend on the features of the programming language that is used to program the methods, on the features of the particular query language, and on how carefully the OOBD scheme is designed. In particular, it is desirable that the programming language used stimulates the most disciplined programming style of object methods. The query language must reasonably restrict the capabilities of users (in particular, with respect to the parameters of the methods involved in the query conditions). Finally, the classes of the OODB scheme should contain simple methods that are not redefined in subclasses and based on those state variables that serve as the basis for organizing access methods.
Note that these restrictions do not entail the dependence of the application program on the peculiarities of the OODB implementation, since the objects remain completely encapsulated. The use of simple methods in terms of queries should be stimulated not by implementation requirements, but by the semantics of objects.
Comments
To leave a comment
Databases IBM System R - relational DBMS
Terms: Databases IBM System R - relational DBMS