Lecture
The basic idea of relational algebra is that if relations are sets, then the means of manipulating relations can be based on traditional set-theoretic operations, supplemented by some special operations specific to databases.
There are many approaches to the definition of relational algebra, which differ in a set of operations and methods of their interpretation, but in principle, more or less equivalent. We describe a slightly advanced initial version of algebra, which was proposed by Codd. In this variant, the set of basic algebraic operations consists of eight operations, which are divided into two classes - set-theoretic operations and special relational operations. The set-theoretic operations include operations:
Special relational operations include:
In addition, the algebra includes an assignment operation, which allows saving the results of calculating algebraic expressions in the database, and the attribute renaming operation, which makes it possible to correctly form the title (scheme) of the resulting relation.
If you do not go into some of the subtleties that we consider in the following subsections, then almost all operations of the set proposed above have an obvious and simple interpretation.
Since the result of any relational operation (except for the assignment operation) is a certain relation, it is possible to form relational expressions in which instead of the relation-operand of some relational operation there is an embedded relational expression.
As we said in the previous lecture, each relation is characterized by a scheme (or title) and a set of tuples (or body). Therefore, if you really want to have an algebra whose operations are closed with respect to the concept of relation, then each operation must produce a relation in the full sense, i.e. it must have both a body and a heading. Only in this case it will be really possible to build nested expressions.
The relationship header is a set of pairs. If you look at the general overview of relational operations given in the previous section, you can see that the attribute domains of the resulting relation are uniquely determined by the domains of the operand relations. However, with the names of the attributes of the result is not always so simple.
For example, imagine that the operand relations of a direct product operation have similar attributes with the same domains. What would be the title of the resulting relationship? Since this is a set, it should not contain the same elements. But to lose the attribute as a result is unacceptable. This means that in this case it is generally impossible to correctly perform the operation of a direct product.
Similar problems may arise in cases of other two-place operations. To resolve them, the renaming operation is introduced into the structure of relational algebra operations. It should be used in any case when there is a conflict of attribute naming in relations — the operands of a single relational operation. Then the rename operation is first applied to one of the operands, and then the main operation is performed without any problems.
In the following, we will assume the use of the renaming operation in all conflict situations.
Although the set-theoretic part of relational algebra is based on the classical theory of sets, the corresponding operations of relational algebra have some peculiarities.
We start with the join operation (everything that will be said about the join is transferred to the operations of crossing and taking the difference). The meaning of the union operation in relational algebra as a whole remains set-theoretic. But if in the theory of sets, the operation of union is meaningful for any two sets of operands, then in the case of relational algebra, the result of the operation of union must be a relation. If we allow the possibility of a set-theoretic union of arbitrary two relations (with different schemes) in relational algebra, then, of course, the result of the operation will be many, but many different types of tuples, i.e. not a relationship. If we proceed from the requirement of closed relational algebra with respect to the concept of relation, then such a join operation is meaningless.
All of these considerations lead to the emergence of the notion of compatibility in a join relationship : two relationships are compatible in a join if and only if they have the same headings. More precisely, this means that the headers of both relationships contain the same set of attribute names, and attributes of the same name are defined on the same domain.
If the two relations are compatible with the combination, then in the usual execution of the operations of combining, intersecting and taking the difference over them, the result of the operation is a relationship with a correctly defined header that matches the title of each of the operand relations. Recall that if the two relations are "almost" compatible in the union, i.e. are compatible with everything except attribute names, then before performing a join type operation, these relations can be made fully compatible with the union by applying the rename operation.
Note that the inclusion in the operations of a relational algebra of the three operations of combining, intersecting and taking the difference is obviously redundant, since it is known that each of these operations is expressed in terms of the other two. Nevertheless, Codd decided at one time to include all three operations, based on the intuitive needs of a potential user of a relational database system, far from mathematics.
Other problems are connected with the operation of taking the direct product of two relations. In set theory, a direct product can be obtained for any two sets, and the elements of the result set are pairs made up of the elements of the first and second sets. Since relations are sets, then for any two relations it is possible to obtain a direct product. But the result will not be an attitude! The elements of the result will not be tuples, but pairs of tuples.
Therefore, in relational algebra, a specialized form of the operation of taking a direct product is used — an extended direct product of relations. When taking an extended direct product of two relations, an element of the resulting relation is a tuple, which is the concatenation (or merge) of one tuple of the first relation and one tuple of the second relation.
But now the second question arises - how to get the correctly formed title of the relationship-result? Obviously, the problem can be the naming of the attributes of the resulting relationship, if the operand relations have the same attributes.
These considerations lead to the emergence of the notion of compatibility in taking an extended direct work . Two relations are compatible in taking a direct product if and only if the sets of attribute names of these relations do not overlap. Any two relationships can be made compatible by taking a direct work by applying a rename operation to one of these relationships.
It should be noted that the operation of taking a direct work is not too meaningful in practice. Firstly, the power of its result is very large even at the permissible capacities of the operands, and secondly, the result of the operation is no more informative than the operands taken together. As we will see a little below, the basic meaning of including the operation of an extended direct product in relational algebra is that it is based on a truly useful join operation.
Regarding the set-theoretic operations of relational algebra, it should also be noted that all four operations are associative. That is, if OP is denoted by any of the four operations, then (A OP B) OP C = A (B OP C), and therefore, without introducing ambiguity, you can write A OP B OP C (A, B and C - relations with the properties required for the correct execution of the corresponding operation). All operations except taking the difference are commutative, i.e. A OP B = B OP A.
In this subsection, we will take a closer look at special relational operations of relational algebra: constraint, projection, connection, and division.
Operation restrictions
The constraint operation requires two operands: a constraint relation and a simple constraint condition. A simple constraint condition can be either of the form (a comp-op b), where a and b are the attribute names of the constrained relation for which the comp-op comparison operation is meaningful, or of the form (a comp-op const), where a is the name of the attribute of the constrained relations, and const is a literally given constant.
As a result of the restriction operation, a relationship is produced whose title matches the header of the operand relationship, and the tuples of the operand relationship for which the value of the constraint condition is true are included in the body.
Let UNION denote the join operation, INTERSECT - the intersection operation, and MINUS - the operation of taking the difference. To denote the restriction operation, we will use the construction A WHERE comp, where A is the constrained relation, and comp is a simple comparison condition. Let comp1 and comp2 be two simple constraints. Then by definition:
Using these definitions, you can use constraint operations, in which the constraint condition is an arbitrary Boolean expression composed of simple conditions using AND, OR, NOT logical connectives and parentheses.
On an intuitive level, a constraint operation is best thought of as taking some "horizontal" clipping from an operand relationship.
Projection taking operation
A projection taking operation also requires two operands — a projected relationship A and a list of attribute names included in relationship header A.
The result of the projection of the relationship A in the list of attributes a 1 , a 2 , ..., a n is a relationship with a header defined by the set of attributes a 1 , a 2 , ..., a n , and with the body consisting of tuples of the form < a 1 : v 1 , a 2 : v 2 , ..., a n : v n > such that with respect to A there is a tuple, the attribute a 1 of which has the value v 1 , the attribute a 2 has the value v 2 , .. ., attribute a n has the value v n . Thus, when performing a projection operation, a "vertical" cutting of the operand relation is selected with the natural destruction of potentially occurring duplicate tuples.
Relationship join operation
A general join operation (also called a conditional join) requires two operands — a joinable relationship and a third operand — a simple condition. Let relations A and B come together. As in the case of a constraint operation, the connection condition comp has the form either (a comp-op b) or (a comp-op const), where a and b are the attribute names of the relations A and B, const is a literally given constant, and comp-op is a valid comparison operation in this context.
Then, by definition, the result of the comparison operation is the ratio obtained by performing the restriction operation on the condition comp of the direct product of relations A and B.
If you carefully interpret this definition, it becomes clear that in the general case the use of the join condition will significantly reduce the power of the result of the intermediate direct product of the operand relationship only in the case when the join condition has the form (a comp-op b), where a and b are attribute names of different operand relationships. Therefore, in practice, it is usually considered that the actual operations of the connection are precisely those operations that are based on the condition of the connection of the above form.
Although the join operation in our interpretation is not primitive (since it is defined using direct product and projection), due to its particular practical importance, it is included in the basic set of relational algebra operations. Note also that in practical implementations the connection is usually not performed exactly as a restriction of the direct product. There are more efficient algorithms that guarantee the same result.
There is an important particular case of a compound — an equicompound; and a simple but important extension of the equicompound operation — a natural compound. A join operation is called an equi join if the join condition is (a = b), where a and b are attributes of different join operands. This case is important because (a) it is often found in practice, and (b) there are effective implementation algorithms for it.
The natural join operation is applied to a pair of relations A and B that have a (possibly composite) common attribute c (that is, an attribute with the same name and defined on the same domain). Let ab denote the union of relations A and B. Then the natural connection A and B is the result of the equilibrium A and B according to A / c and BBC designed for ab. If we recall the definition of the foreign key of the relation introduced by us at the end of the previous chapter, it should become clear that the main purpose of the natural connection operation is the possibility of restoring a complex entity decomposed due to the requirement of the first normal form. The operation of a natural connection is not included directly in the set of operations of relational algebra, but it has very important practical significance.
Relational division operation
This operation is the least obvious of all operations of relational algebra and therefore needs a more detailed explanation. Let two relations be given - A with the header {a 1 , a 2 , ..., a n , b 1 , b 2 , ..., b m } and B with the header {b 1 , b 2 , ..., b m }. We assume that the attribute b i relations A and attribute b i relations B not only have the same name, but also defined on the same domain. We call the set of attributes {a j } the composite attribute a, and the set of attributes {b j } the composite attribute b. After that we will talk about the relational division of the binary relation A (a, b) into the unary relation B (b).
The result of dividing A by B is a unary relation C (a) consisting of tuples v such that with respect to A there are tuples such that the set of values {w} includes the set of values of the attribute b with respect to B.
Suppose that the employee database maintains two relationships: EMPLOYEES (NAME, OTD_NOMER) and NAMES (NAME), and the unary relation NAMES contains all surnames that the employees of the organization have. Then, after performing the relational division operation, the EMPLOYEES on the NAME relation will receive a unary relation containing the numbers of the departments whose employees have all the possible names in this organization.
Comments
To leave a comment
Databases IBM System R - relational DBMS
Terms: Databases IBM System R - relational DBMS