Lecture
The language for interacting with the SQL database appeared in the mid-70s and was developed as part of the experimental RDBMS System R. The initial name of the SEQUEL language (Structered English Query Language) only partially reflects the essence of this language. Of course, the language was focused mainly on user-friendly, user-friendly formulation of queries to a relational database, but in fact it was already a complete database language containing, in addition to operators of query formulation and database manipulation, means for defining and manipulating a database schema; definition integrity constraints and triggers; DB views; the ability to define physical layer structures that support efficient query execution; authorization of access to relationships and their fields; save points and rollbacks. In the language, there were no means of synchronizing access to database objects from side-by-side transactions: from the very beginning it was assumed that the required synchronization is implicitly performed by the DBMS.
Consider these properties of the language in a little more detail.
As it is known, the two fundamental languages of queries to relational databases are the languages of relational algebra and relational calculus. For all their rigor and theoretical validity, these languages are rarely used in modern relational database management systems as tools of the user interface. Requests in these languages are difficult to formulate and understand. SQL is a combination of relational calculus of tuples and relational algebra, and there is still no general agreement on which of the classical languages it is closer to. At the same time, the possibilities of SQL are broader than those of these basic relational languages, in particular, in the general case, it is impossible to translate a query formulated in SQL into an expression of relational algebra, and its expansion is required.
The essential properties of SQL sublanguage are the ability to simply formulate queries with connections of several relationships and use nested subqueries in predicates of the sample. Generally speaking, the simultaneous presence of both means is redundant, but this gives the user, when formulating a request, the possibility of choosing a variant that is more understandable to him.
In predicates with nested subqueries in SQL System R, you can use the set-comparison comparison operators, which makes it possible to formulate quantified queries (these capabilities are usually the most difficult to understand by users and therefore, in the future, clearly quantified predicates appeared in SQL).
The essential feature of SQL is the ability to specify in the query the needs of grouping the relationship-result by the specified fields with the support of the sampling conditions for the whole group. Such sampling conditions may contain aggregate functions calculated on a group. This SQL feature primarily distinguishes this language from the languages of relational algebra and relational calculus, which do not contain similar means.
Another difference in SQL is the optional removal of duplicate tuples in the final or intermediate relationship results. Strictly speaking, the result of a select statement in SQL is not a relation, but a multiset of tuples. In cases where the semantics of the query requires a relation, the destruction of duplicates is done implicitly.
The most general form of a query in the SQL language is a set-theoretic algebraic expression composed of elementary queries. All basic set-theoretic operations (UNION, INTERSECT, and MINUS) were allowed in SQL System R.
Working with null values in SQL System R was not fully thought out, although it was implicitly assumed to use three-digit logic when calculating logical expressions.
The UPDATE and DELETE data manipulation statements are built on the same principles as the SELECT data select statement. The set of tuples of the specified relation to be modified or deleted is determined by the logical expression included in the corresponding operator, which may include complex predicates, including those with nested subqueries.
In the insertion operator of the tuple (s) in the specified relation, the entered tuple can be specified both in the literal form and with the help of the internal sub-operator of the sample.
The SQL System R DB schema definition statements included the creation and destruction of permanent and temporary stored relations (CREATE TABLE and DROP TABLE) and the creation and destruction of represented relations (CREATE VIEW and DROP VIEW). In the language and in the implementation of System R, it was not forbidden to use schema definition statements within a transaction containing sample and data manipulation statements. It was allowed, for example, to use the operators of sampling and manipulation of data, in which relations that do not exist in the database at the time of compilation of the operator are indicated. Of course, this possibility significantly complicated the implementation and was required essentially very rarely.
The ALTER TABLE database manipulation statement allowed to add specified fields to existing relationships. In the language description, it was determined that the execution of this operator should not lead to the invalidity of previously compiled statements on the relation whose schema changes, and that the values of the newly defined fields in the existing tuples of the relation become uncertain.
The SQL System R language included very powerful means of controlling and maintaining the integrity of the database. The controls were based on the integrity constraints device (ASSERTIONS). In fact, integrity constraint is a logical expression calculated on the current state of the database, the falsity of which corresponds to the non-integral state of the database. The logical expression of integrity constraint could contain any valid predicate in the language.
More precisely, integrity constraints were divided into two classes: checked after the execution of the data manipulation statement and checked when the transaction was completed or when the special INFORCE INTEGRITY statement was executed. The types of predicates that can be used in the definition of integrity constraints for different classes differ. Operators of the first class check, in fact, the current tuple that is being manipulated. In the second case, the relations specified in the integrity constraint are checked, i.e. all their tuples. The system's response in the language to the violation of integrity constraints of different classes also differs. In the first case, an integrity violation violation causes the transaction to roll back to the point immediately preceding the data manipulation operation, the execution of which caused the integrity violation violation. In the second case, the restriction leads to a complete rollback of the transaction to its beginning.
A very important mechanism defined in the SQL System R language is the trigger mechanism. In the context of System R, this mechanism was considered mainly as a means of automatically maintaining the integrity of the database. When defining a trigger, the condition for verifying its applicability (the name of the relationship and the type of data manipulation operation), the condition of applicability of the trigger (a logical expression constructed according to rules close to the rules for integrity constraints of the first class) and the action that must be performed on the database in case of truth conditions of applicability. Such an action could be expressed using an arbitrary data manipulation operator. During the execution of the action, other triggers, etc. could be triggered.
The mechanisms of integrity constraints and System R triggers were very powerful and common, but their implementation is very difficult and expensive (as already noted, the triggers were never implemented in System R). Additional complexity in the implementation was created by the fact that it was allowed (at least not prohibited by the language) to define integrity constraints and triggers within the same transaction in which data manipulation statements are executed. With the most complete implementation, a large number of additional actions would be required during the execution of the transaction. In addition, in some cases, the absence of fixed semantics of the corresponding language constructs led to an ambiguous understanding of the execution of transactions.
The language allowed the use of stored relations database and represented relations. The most successful solution was to use the definition of the general apparatus of the sample operators. Any sample operator can be used to define a representation.
There are no restrictions on the use of representations in the language: in any SQL statement in which the use of the name of the stored relation is allowed, the name of the representation is allowed. In SQL System R, nothing is said about the recommended way to implement access to views, but with any method, the effect should be the same as if you would perform the full materialization of the view before the statement was executed.
A lot of problems, research and suggestions gave rise to the potential for operators to manipulate data on representations. It is clear that this possibility is easily realizable for simple representations, but in more complex cases, not only the implementation, but also the semantics of operations becomes nontrivial. By the way, in System R, data manipulation operators were allowed only on simple representations.
Introduction to the relational language, such as SQL, of explicit statements for generating and destroying physical layer structures that support efficient database queries, was a purely pragmatic solution in SQL System R that provides the possibility of all types of database work using one language.
In SQL System R two types of such structures are mentioned: indices and links (links). An index in its abstract language representation is an inverted file that provides access to tuples of the corresponding relation based on the specified values of one or more columns that make up the index key. Operators of the language allowed to create and destroy indexes, but in no way made it possible to clearly indicate the need to use the existing index when executing the sampling operator, the decision about this was assigned to the implementation.
With the help of the index definition operator it was possible to express two additional statements concerning the logical scheme of the relation and the physical structure of its storage. The use of the UNIQUE keyword in determining the index meant that the key of this index is a possible key of the corresponding relation. In fact, this means that there is an additional mechanism for determining the integrity of a relationship. One of the indices for this relationship could be defined with the keyword CLUSTERING. This means the requirement of physical clustering in the external memory of tuples of a relationship with equal or close index key values.
Communication definition operators allowed, in the style of a network data model, to organize lists of tuples of a specified relation in external memory. As in the case of indexes, operators allowed the creation and destruction of such lists, but did not make it possible to explicitly indicate the need to use existing lists when executing select statements. The greater laboriousness of maintaining lists when executing data manipulation operators and the difficulty of estimating the cost of using them when executing sample operators resulted in the fact that the communication mechanism disappeared from the language at a late stage of the System R. Project Since then, this mechanism has not appeared in any variant of SQL.
The essential feature of the SQL language, which appeared in it from the very beginning, is to ensure the protection of access to the data by means of the language itself. The main idea of this approach is that a predefined set of privileges is introduced with respect to any relation in the database and any relation column. Each transaction implicitly associates the identifier of the user on whose behalf it is executed (the methods of communication and user identification are not fixed in the language and are defined in the implementation).
After creating a new relationship, all privileges associated with this relationship and all its columns belong only to the user who created the relationship. The privileges include the privilege of transferring all or part of the privileges to another user, including the privilege to transfer the privileges. Technically, the transfer of privileges is performed when the SQL GRANT statement is executed. There is also the privilege of removing all or part of the privileges from the user to whom they were previously transferred. This privilege can also be transferred. Technically, the removal of privileges occurs during the execution of the SQL REVOKE statement.
The data access authority is checked based on the information about the permissions that exist at the time of compiling the corresponding SQL statement. Similar to what we noted in connection with integrity constraints and triggers, in SQL System R there were no restrictions on the use of GRANT and REVOKE statements. This led to significant technical difficulties in implementation, and sometimes to an ambiguous understanding of behavior.
For a long time, the approach to protecting data from unauthorized access was taken almost without criticism, however, in connection with the spreading use of relational DBMS in non-traditional applications, criticism is increasingly heard. If, for example, multilevel data protection is to be supported in the database system, the corresponding system of authorities is very difficult and sometimes impossible to build on the basis of SQL tools.
In SQL System R, there were two special operators for setting the so-called savepoints of a transaction and for rolling back a transaction to a previously established savepoint. In the literature relating to System R, the discussion of these possibilities is practically not contained, which implicitly implies that they were not implemented.
The straightforward implementation of this mechanism does not cause any special technical difficulties, but it is also not very useful, because after performing a partial rollback of a transaction, the successful continuation of the application program would require restoring its state at the appropriate point, but this is not supported at all. It is clear that with a more thorough study the mechanisms of points of preservation and integrity control should be linked. For example, it would be natural that during the execution of the ENFORCE INTEGRITY statement, if any integrity constraints are violated, the transaction is automatically rolled back to the nearest save point, in which there was no violation of the database integrity. This would significantly complicate the implementation, but it would be very useful. Similarly, it would be possible to use a savepoint mechanism for automatic rollback of transactions due to the occurrence of synchronization deadlocks.
Let us mention two more important properties of the SQL System R language, which are present in different forms in all developed subsequent versions of the language.
In SQL System R, there are special operators that support embedding SQL statements in traditional programming languages (in System R, PL / 1 was the main language).
The main problem of embedding SQL in a programming language was that SQL is a relational language, i.e. its operators mostly work with sets, while in programming languages scalar operations are basic. The SQL solution is that the language additionally includes statements that provide the sequential access to the result of the query to the database.
For this, the language introduces the concept of a cursor with which the selection operator is associated. You can execute the OPEN operator on a specific cursor, meaning the materialization of the query result relationship, the FETCH operator, which allows you to select the next tuple of the resulting relation in the program memory, and the CLOSE operator, signifying the end of working with this cursor.
Additional flexibility in creating application programs with embedded SQL provides the ability to parameterize SQL statements with variable values of the enabling program.
To simplify the creation of interactive SQL-based systems, statements were included in SQL System R that allow you to compile and execute any SQL statement at run time.
The PREPARE statement causes a dynamic compilation of the SQL statement, the text of which is contained in the specified variable in the character string of the enclosing program. The text can be placed in a variable when the program is executed in any acceptable way, for example, entered from a terminal.
The DESCRIBE statement is used to obtain information about the specified SQL statement previously prepared using the PREPARE statement. Using this operator, you can find out, firstly, whether the prepared operator is a sample operator, and secondly, if it is a sample operator, obtain complete information about the number and types of columns of the resulting relation.
To execute a previously prepared SQL statement that is not a select statement, use the EXECUTE statement. Для выполнения динамически подготовленного оператора выборки используется аппарат курсоров с некоторыми отличиями по части задания адресов переменных включающей программы, в которые должны быть помещены значения столбцов текущего кортежа результата.
Подводя итог приведенному краткому описанию основных черт SQL System R, отметим, что несмотря на недостаточную техническую проработку, в идейном отношении язык содержал все необходимые средства, позволяющие использовать его как базовый язык СУБД.
Comments
To leave a comment
IBM System R — реляционная СУБД
Terms: IBM System R — реляционная СУБД