Lecture
However, the situation is fundamentally different for the information systems mentioned at the beginning of the lecture. These systems are mainly focused on the storage, selection and modification of permanently existing information. The structure of information is often very complex, and although data structures are different in different information systems, there is often much in common between them. At the initial stage of using computer technology for information management, data structuring problems were solved individually in each information system. The necessary add-ins were made on file systems (program libraries), just as it is done in compilers, editors, etc.
But since information systems require complex data structures, these additional individual data management tools were an essential part of information systems and practically repeated from one system to another. The desire to isolate and summarize the general part of information systems responsible for managing complexly structured data was, in our opinion, the first motivating reason for creating a DBMS. It soon became clear that it was impossible to manage a common library of programs that implemented more complex data storage methods over the standard basic file system.
Let's show it by example. Suppose we want to implement a simple information system that supports accounting for employees of a certain organization. The system should perform the following actions: issue lists of employees by department, support the possibility of transferring an employee from one department to another, hiring new employees and dismissing employees. For each department, the possibility of obtaining the name of the head of this department, the total number of the department, the total amount of salary paid last time, etc. should be maintained. For each employee, the possibility of issuing a certificate number for the full name of the employee, issuing a full name for the certificate number, receiving information about the current compliance of the position held by the employee and the amount of his salary should be supported.
Suppose that we decided to base this information system on a file system and use a single file while expanding the basic capabilities of the file system at the expense of a special library of functions. Since the minimum information unit in our case is an employee, it is natural to require that this file contain one record for each employee. What fields should contain such a record? The full name of the employee (COTR_NAME), his ID number (COTR_NOMER), information on his correspondence to the position held (for simplicity, yes or no) (COTP_STAT), salary (COTR_ZARP), department number (COTR_OTD_NOMER). Since we want to limit ourselves to one file, the same record should contain the name of the head of the department (COTR_OTD_RUK).
The functions of our information system require that the possibility of multi-key access to this file using unique keys (non-duplicable in different records) COTR_NAME and COTP_NOMER. In addition, it should be possible to select all records with a common value SOTR_OTD_NOMER, that is, access using a non-unique key. In order to get the number of the department or the total salary, each time such a function is performed, the information system will have to select all records about the department employees and calculate the corresponding total values.
Thus, we see that even for such a simple system, its implementation based on the file system, first, requires the creation of a rather complex add-on for multi-key access to files, and, second, it requires substantial storage redundancy (for each employee of one department the name of the manager is repeated) and the execution of mass sampling and calculations to obtain summary information about the departments. In addition, if during the operation of the system we want, for example, to issue lists of employees who receive a given salary, then we will have to either completely look through the file or restructure it, declaring the key field COTR_ZARP.
The first thing that comes to mind is to support two multi-key files: EMPLOYEES and DEPARTMENTS. The first file should contain the fields COTR_IMA, COTP_NOMER, COTP_STAT, COTP_ZARP and COTP_OTD_NOMER, and the second file - OTD_NOMER, OTD_RUK, OTD_SOTR_ARP (total salary) and OTD_Size (total number of employees in a department). Most of the inconveniences listed in the previous paragraph will be overcome. Each of the files will contain only non-duplicable information; there is no need for dynamic calculations of the summary information. But note that with such a transition, our information system should have some new features that bring it closer to the DBMS.
First of all, the system should now know that it works with two information-related files (this is a step towards the database schema), must know the structure and meaning of each field (for example, that COTP_OTD_NOMER in the file EMPLOYEES and OTD_NOMER in the DEPARTMENTS file mean the same however), and also understand that in a number of cases, a change in the information in one file should automatically trigger a modification in the second file so that their overall contents are consistent. For example, if a new employee is recruited, then you need to add an entry to the STAFF file, and also change the OTDZARP and OTD_SIZE fields in the DEPARTMENT file that describes the department’s employee.
The concept of data consistency is the key concept of databases. In fact, if the information system (even as simple as in our example) supports the consistent storage of information in several files, we can say that it supports the database. If some auxiliary data management system allows you to work with several files, ensuring their consistency, you can call it a database management system. The mere requirement to maintain the consistency of data in several files does not allow the library to get by with the functions: such a system must have some of its own data (metadata) and even knowledge that determines the integrity of the data.
But this is not all that usually require from the database. Firstly, even in our example, it is inconvenient to implement such requests as “issue the total number of the department in which Pyotr Sidorov works”. It would be much easier if the DBMS allowed formulating such a request in a language close to the users. Such languages are called database query languages . For example, in SQL, our query could be expressed in the form:
SELECT SIZE
FROM EMPLOYEES, DEPARTMENTS
WHERE SOTR_IMA = "PETR IVANOVICH SIDOROV"
AND SOTR_OTD_NOMER = OTD_NOMER
Thus, when formulating a query, the DBMS will allow not to think about how this query will be executed. Its metadata will contain information that the COTS_NAME field is the key for the file EMPLOYEES, and the DEPARTMENT_NUMBER is for the DEPARTMENTS file, and the system will take advantage of this. If there is a need to obtain a list of employees who do not correspond to their position, it is enough to submit a request to the system
SELECT SOTR_NAME, SOTR_NOM.
FROM EMPLOYEES
WHERE SOTR_STAT = "NO",
and the system itself will perform the necessary full file review EMPLOYEES, since the COTR_TAT field is not the key.
Further, imagine that in our initial implementation of an information system based on the use of libraries of advanced file access methods, a new employee registration is processed. Following the requirements of a coordinated file change, the information system inserted a new record in the file EMPLOYEES and was going to modify the file record DEPARTMENTS, but it was at this point that the emergency power off occurred. Obviously, after the system is restarted, its database will be in a mismatched state. You will need to find out (and for this you need to explicitly check the correspondence of the information with the files to EMPLOYEES and DEPARTMENTS) and bring the information into a consistent state. Real DBMSs take on such work. The application system is not required to ensure that the state of the database is correct.
Finally, let us imagine that we want to provide parallel (for example, multi-terminal) work with the employee database. If you rely only on the use of files, then to ensure correctness, for the entire time of modification of any of the two files, access by other users to this file will be blocked (remember the capabilities of file systems for synchronizing parallel access). Thus, admission to the work of Peter Ivanovich Sidorov will significantly slow down the receipt of information about the employee Ivan Sidorovich Petrov, even if they work in different departments. Real DBMSs provide much finer synchronization of concurrent data access.
Thus, DBMSs solve many problems that are difficult or impossible to solve when using file systems. At the same time there are applications for which there are quite enough files; applications for which it is necessary to decide what level of work with data in external memory is required for them, and applications for which databases are absolutely necessary.
Comments
To leave a comment
Databases IBM System R - relational DBMS
Terms: Databases IBM System R - relational DBMS