Lecture 2. DBMS Functions. Typical organization DBMS. Examples

Lecture

As was shown in the first lecture, the traditional capabilities of file systems are not enough to build even simple information systems. We identified several needs that are not covered by the capabilities of file management systems: maintaining a coherent set of files; providing a data manipulation language; information recovery after various kinds of failures; really parallel work of several users. We can assume that if an application information system relies on some data management system with these properties, then this data management system is a database management system (DBMS).

2.1. The main functions of the DBMS

More precisely, the following functions are usually assigned to the number of DBMS functions:

2.1.1. Direct data management in external memory

This function includes providing the necessary external memory structures both for storing data directly included in the database and for official purposes, for example, to speed up data access in some cases (usually indices are used for this). In some implementations of the DBMS, the capabilities of existing file systems are actively used, in others work is carried out up to the level of external memory devices. But we emphasize that in developed DBMS, users in any case are not obliged to know whether the DBMS uses the file system, and if it does, how files are organized. In particular, the DBMS supports its own database object naming system.

2.1.2. Managing RAM Buffers

DBMS usually work with a database of significant size; At least this size is usually substantially more than the available amount of RAM. It is clear that if, when accessing any data element, an exchange with external memory is performed, the entire system will operate at the speed of an external memory device. Virtually the only way to really increase this speed is data buffering in RAM. At the same time, even if the operating system produces system-wide buffering (as in the case of UNIX OS), this is not enough for the purposes of a DBMS that has much more information about the usefulness of buffering a particular part of the database. Therefore, in developed DBMS, its own set of RAM buffers is maintained with its own buffer replacement discipline.

Note that there is a separate direction of the DBMS, which is focused on the constant presence in the RAM of the entire database. This direction is based on the assumption that in the future the amount of computer memory will be so large that it will allow you not to worry about buffering. While these works are in the research stage.

2.1.3. Transaction management

A transaction is a sequence of operations on a database that are considered by the DBMS as a whole. Either the transaction is successfully completed, and the DBMS records (COMMIT) the database changes made by this transaction in external memory, or none of these changes have any effect on the state of the database. The concept of a transaction is necessary to maintain the logical integrity of the database. If we recall our example of an information system with files EMPLOYEES and DEPARTMENTS, then the only way not to violate the integrity of the database when performing a new employee recruitment operation is to combine elementary operations on files EMPLOYEES and DEPARTMENTS in one transaction. Thus, the maintenance of the transaction mechanism is a prerequisite even for single-user DBMS (if, of course, such a system deserves the name of the DBMS). But the concept of a transaction is much more important in multi-user DBMS.

The property that each transaction begins with a complete state of the database and leaves this state complete after its completion makes it very convenient to use the concept of a transaction as a unit of user activity in relation to the database. With proper management of concurrently performing transactions by the DBMS, each of the users may, in principle, feel like the only DBMS user (in fact, this is a somewhat idealized view, since in some cases, users of multi-user DBMS may feel the presence of their colleagues).

Important concepts of transaction serialization and serialization of a mixture of transactions are associated with transaction management in a multi-user DBMS. Serialization of concurrently running transactions is understood as such an order of planning their work, in which the total effect of a mixture of transactions is equivalent to the effect of their some sequential execution. A serial plan for executing a mixture of transactions is a plan that leads to the serialization of transactions. It is clear that if it is possible to achieve truly serial execution of a mixture of transactions, then for each user who initiated a transaction, the presence of other transactions will be invisible (except for some slowdown compared with the single-user mode).

There are several basic transaction serialization algorithms. In centralized databases, the most common algorithms are based on synchronization captures of database objects. When using any serialization algorithm, situations of conflict between two or more transactions on accessing database objects are possible. In this case, to maintain serialization, you must roll back (eliminate all changes made to the database) of one or more transactions. This is one of the cases when a user of a multi-user DBMS can really (and rather unpleasantly) feel the presence in the system of transactions of other users.

2.1.4. Journaling

One of the main requirements for a DBMS is the reliability of data storage in external memory. Storage reliability means that the DBMS must be able to restore the last consistent state of the database after any hardware or software failure. Two possible types of hardware failures are usually considered: the so-called soft failures, which can be interpreted as a sudden stop of the computer (for example, emergency power off), and hard failures, characterized by loss of information on external memory media. Examples of software failures can be: crash of the DBMS (due to an error in the program or as a result of some hardware failure) or crash of the user program, with the result that some transaction remains unfinished. The first situation can be considered as a special kind of soft hardware failure; when the latter occurs, it is required to eliminate the consequences of only one transaction.

It is clear that in any case, to restore the database you need to have some additional information. In other words, maintaining reliable data storage in a database requires data storage redundancy, and the part of the data that is used for recovery should be stored very reliably. The most common method of maintaining such redundant information is logging database changes.

The log is a special part of the database, inaccessible to DBMS users and supported with special care (sometimes two copies of the log are maintained, located on different physical disks), which receive records of all changes in the main part of the database. In different DBMS, database changes are logged at different levels: sometimes a log entry corresponds to some logical DB change operation (for example, deleting a row from a relational database table), sometimes a minimal internal modification operation of the external memory page; Some systems use both at the same time.

In all cases, adhere to the strategy of "proactive" logging (the so-called Write Ahead Log Protocol - WAL). Roughly speaking, this strategy is that the record of a change in any object in the database should go into the external memory of the log before the changed object gets into the external memory of the main part of the database. It is known that if the WAL protocol is correctly observed in the DBMS, then using the log you can solve all the problems of restoring the database after any failure.

The simplest recovery situation is an individual transaction rollback. Strictly speaking, this does not require a system-wide DB change log. For each transaction, it is sufficient to maintain a local log of database modification operations performed in this transaction and roll back the transaction by performing inverse operations, following from the end of the local log. In some DBMS they do this, but in most systems local logs do not support, and individual rollback of a transaction is performed according to a system-wide log, for which all entries from one transaction are tied backwards (from end to beginning).

In case of a soft failure in the external memory of the main part of the database, there may be objects modified by transactions that did not end at the time of the failure, and there may be no objects modified by transactions that were successfully completed by the time of failure (due to the use of memory buffers which disappear ). If the WAL protocol is respected, the external log memory should be guaranteed to contain records related to the modification operations of both types of objects. The purpose of the recovery process after a soft failure is the state of the external memory of the main part of the database, which would have occurred if all the completed transactions were committed to changes in the external memory and which contained no traces of incomplete transactions. In order to achieve this, you first roll back the pending transaction (undo), and then re-play (redo) those transactions of completed transactions, the results of which are not displayed in external memory. This process contains many subtleties associated with the overall organization of the management of buffers and the magazine. We will discuss this in more detail in the corresponding lecture.

To restore the database after a hard failure, use a log and an archive copy of the database. Roughly speaking, an archive copy is a complete copy of the database by the time the journal starts to fill out (there are many options for a more flexible interpretation of the meaning of the archive copy). Of course, for a normal recovery of the database after a hard failure, it is necessary that the log is not lost. As already noted, the security of the journal in external memory in the DBMS is particularly high requirements. Then the restoration of the database consists in the fact that, based on the archive copy of the log, the work of all transactions that have been completed by the time of the failure is reproduced. In principle, you can even reproduce the work of unfinished transactions and continue their work after the completion of recovery. However, in real systems this is usually not done, since the recovery process after a hard failure is quite long.

2.1.5. Database language support

Special languages are used to work with databases, generally referred to as database languages . In early DBMS, several languages specialized in their functions were supported. The two languages that stood out most were the DB Schema Definition Language (SDL) and the Data Manipulation Language (DML). The SDL served mainly to determine the logical structure of the database, i.e. the structure of the database, as it appears to users. DML contained a set of data manipulation statements, i.e. operators, allowing to enter data into the database, delete, modify or select existing data. We will look at the languages of the early DBMS in more detail in the next lecture.

In modern DBMS, a single integrated language is usually supported, containing all the necessary tools for working with a database, starting from its creation, and providing a basic user interface with databases. The standard language of the currently most common relational database management systems is SQL (Structured Query Language). In several lectures of this course, the SQL language will be discussed in sufficient detail, but for now we will list the main functions of a relational database management system that are supported at the "language" level (that is, the functions supported when implementing the SQL interface).

First of all, the SQL language combines SDL and DML, i.e. allows you to define a relational database schema and manipulate data. In this case, the naming of database objects (for a relational database — the naming of tables and their columns) is maintained at the language level in the sense that the compiler of the SQL language converts the names of objects into their internal identifiers on the basis of specially supported service tables-directories. The internal part of the DBMS (kernel) does not work at all with the names of the tables and their columns.

The SQL language contains special tools for determining database integrity constraints. Again, integrity constraints are stored in special tables-directories, and ensuring the integrity control of the database is performed at the language level, i.e. when compiling database modification statements, the SQL compiler generates the corresponding program code based on the integrity constraints in the database.

Special operators of the SQL language allow you to define the so-called database views, which are actually queries stored in the database (the result of any query to a relational database is a table) with named columns. For a user, a view is the same table as any base table stored in a database, but with the help of views you can limit or, on the contrary, expand the database’s visibility for a specific user. Maintaining submissions is also done at the language level.

Finally, authorization of access to database objects is also performed on the basis of a special set of SQL statements. The idea is that the user must have different permissions to execute different types of SQL statements. The user who created the database table has a full set of permissions for working with this table. These powers include the authority to transfer all or part of the authority to other users, including the authority to transfer authority. User permissions are described in special tables-directories, authority control is maintained at the language level.

A more accurate description of the possible implementations of these functions based on the SQL language will be given in lectures on the SQL language and its implementation.

Lecture 2. DBMS Functions. Typical organization DBMS. Examples

2.1. The main functions of the DBMS

2.1.1. Direct data management in external memory

2.1.2. Managing RAM Buffers

2.1.3. Transaction management

2.1.4. Journaling

2.1.5. Database language support

See also

Comments

To leave a comment

Databases IBM System R - relational DBMS

Terms: Databases IBM System R - relational DBMS