1.1. File systems

Lecture



A historic step was the transition to the use of centralized file management systems. From the point of view of an application program, a file is a named area of ​​external memory into which data can be written to and read from. The file naming rules, the method of accessing the data stored in the file, and the structure of this data depend on the specific file management system and, possibly, on the file type. The file management system assumes the allocation of external memory, the mapping of file names to corresponding addresses in external memory, and the provision of access to data.

The first developed file system was developed by IBM for its 360 series. By now it is very outdated and we will not consider it in detail. We only note that in this system both purely sequential and indexed-sequential files were supported, and the implementation was largely based on the capabilities of only disk device controllers that appeared by this time. If we take into account the fact that the concept of a file in OS / 360 was chosen as the main abstract concept to which any external object, including external devices, corresponded, then working with files at the user level was very inconvenient. A number of cumbersome and overloaded structures was required. All this is well known to programmers of the middle and older generation, who have gone through the use of domestic analogues of IBM computers.

1.1.1. File structures

Next we will talk about more modern file system organizations. Let's start with the file structures. First of all, in almost all modern computers, the main external memory devices are magnetic disks with movable heads, and they serve to store files. Such magnetic disks are packages of magnetic plates (surfaces), between which a package of magnetic heads moves on one lever. The step of movement of the package of heads is discrete, and the cylinder of the magnetic disk logically corresponds to each position of the package of heads. On each surface, a cylinder "cuts out" a track, so that each surface contains a number of tracks equal to the number of cylinders. When partitioning a magnetic disk (a special action preceding the use of a disk), each track is divided into the same number of blocks in such a way that the same number of bytes can be written to the maximum of each block. Thus, to make an exchange with a magnetic disk at the hardware level, you need to specify the cylinder number, the surface number, the block number on the corresponding track and the number of bytes to be written or read from the beginning of this block.

However, this ability to share portions with magnetic disks smaller than the block size is currently not used in file systems. This is due to two circumstances. First, when performing an exchange with a disk, the equipment performs three basic actions: supplying the heads to the desired cylinder, searching the desired block on the track and actually exchanging with this block. Of all these actions, on average, the first takes the most time. Therefore, a significant gain in the total exchange time due to reading or writing only a part of the block is almost impossible to obtain. Secondly, in order to work with parts of the blocks, the file system must provide the appropriate size of the RAM buffer, which greatly complicates the distribution of the RAM.

Therefore, in all file systems, some basic level is explicitly or implicitly allocated, which ensures work with files representing a set of blocks directly addressable in the address space of a file. The size of these logical blocks of the file is the same or a multiple of the size of the physical disk block and is usually chosen equal to the size of the virtual memory page supported by the computer’s hardware together with the operating system.

In some file systems, the base level is available to the user, but more often it is covered up with some higher level that is standard for users. Two main approaches are common. In the first approach, peculiar, for example, to file systems of the operating systems of the company DEC RSX and VMS, users present the file as a sequence of records. Each record is a sequence of bytes of constant or variable size. Records can be read or written sequentially or a file positioned to write with the specified number. Some file systems allow you to structure records on the fields and declare some fields to be write keys. In such file systems, it is possible to request the selection of a record from a file by its specified key. Naturally, in this case, the file system supports additional, invisible to the user, service data structures in the same (or another, service) base file. Common ways of organizing key files are based on the hashing technique and B-trees (we will talk about these techniques in more detail in the following lectures). There are also multi-key ways to organize files.

The second approach, which became common with the UNIX operating system, is that any file is represented as a sequence of bytes. From the file, you can read the specified number of bytes, either starting from its beginning, or having previously made its positioning on the byte with the specified number. Similarly, you can write the specified number of bytes at the end of the file, or pre-positioning the file. Note that, nevertheless, hidden from the user, but existing in all types of UNIX OS file systems, is the basic block representation of the file.

Of course, for both approaches it is possible to provide a set of transformative functions that lead the presentation of the file to some other kind. An example of this is the maintenance of the standard C file system of the C programming language in the environment of DEC operating systems.

1.1.2. File Naming

Let us briefly discuss the file naming methods. All modern file systems support multi-level file naming by maintaining additional files with a special structure — directories — in external memory. Each directory contains the names of the directories and / or files contained in this directory. Thus, the full file name consists of a list of directory names plus the name of the file in the directory that directly contains the file. The difference between the naming of files in different file systems is how this chain of names begins.

In this regard, there are two extreme options. In many file management systems, it is required that each archive of files (a complete directory tree) resides entirely on a single disk package (or logical disk, a section of a physical disk package, represented by means of the operating system as a separate disk). In this case, the full file name begins with the name of the disk device on which the corresponding disk is installed. This type of naming is used in DEC file systems, and PC file systems are very close to this. You can call this organization maintaining isolated file systems.

The other extreme option was implemented in the file systems of the Multics operating system. This system deserves a separate large conversation, it was implemented a number of original ideas, but we will focus only on the features of the organization of the archive files. In the Miltics file system, users represented the entire collection of directories and files as a single tree. The full file name began with the root directory name, and the user was not obliged to take care of installing any particular disks on the disk device. The system itself, searching for a file by its name, requested the installation of the necessary disks. This file system can be called completely centralized.

Of course, in many ways centralized file systems are more convenient than isolated ones: the file management system takes on more routine work. But in such systems there are significant problems if someone needs to transfer the subtree of the file system to another computing installation. The compromise solution is applied in UNIX OS file systems. At the basic level, isolated file archives are supported in these file systems. One of these archives is declared the root file system. After starting the system, you can "mount" the root file system and a number of isolated file systems into one common file system. Technically, this is done using a special empty directory in the root file system. The special system call the UNIX OS Courier allows you to connect the root directory of the specified archive of files to one of these empty directories. After the shared file system is mounted, the file naming is the same as if it were centralized from the very beginning. If we consider that the file system is usually mounted during system promotion, then users of UNIX OS usually don’t think about the initial origin of the common file system.

1.1.3. File protection

Since file systems are a common repository of files that belong, generally speaking, to different users, file management systems should ensure file access authorization. In general, the approach is that with respect to each registered user of the given computing system, for each existing file, the actions that are allowed or prohibited for the given user are indicated. There were attempts to implement this approach in full. But this caused too much overhead for both the storage of redundant information and the use of this information to control access eligibility.

Therefore, in most modern file management systems, an approach to file protection is used, which was first implemented in UNIX. In this system, each registered user corresponds to a pair of integer identifiers: the identifier of the group to which this user belongs, and his own identifier in the group. Accordingly, each file contains the full identifier of the user who created this file, and notes what actions he can perform with the file himself, what actions with the file are available to other users of the same group, and what users of the other groups can do with the file. This information is very compact, a small number of actions are required for verification, and this method of access control is satisfactory in most cases.

1.1.4. Multi-user access mode

The last thing we dwell on in relation to files is the way they are used in a multiuser environment. If the operating system supports multi-user mode, the situation is quite realistic when two or more users simultaneously try to work with the same file. If all these users are going to just read the file, nothing terrible will happen. But if at least one of them will change the file, mutual synchronization is required for this group to work correctly.

Historically, file systems used the following approach. In the operation of opening a file (the first and obligatory operation with which a session with a file should begin), among other parameters, indicated the mode of operation (reading or changing). If by the time this operation was performed on behalf of some program A, the file was already in the open state on behalf of some other program B (it would be more correct to say “process”, but we will not go into terminological subtleties), and the existing opening mode was incompatible with the desired mode ( only read modes are compatible), depending on the system’s characteristics, program A either reported the impossibility of opening the file in the desired mode, or it was blocked until program B completes the closing operation a file.

Note that in earlier versions of the UNIX OS file system, no means of synchronizing parallel access to files were implemented at all. The open file operation was always performed for any existing file, if this user had the appropriate access rights. When working together, synchronization should be done outside the file system (and UNIX did not provide any special tools for this). In modern implementations of the UNIX OS file systems, the user wishes to synchronize when opening files. In addition, it is possible to synchronize several processes that modify the same file in parallel. For this purpose, a special mechanism of synchronization captures of open file address ranges has been introduced.

created: 2014-09-27
updated: 2024-11-14
312



Rating 9 of 10. count vote: 2
Are you satisfied?:



Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Databases IBM System R - relational DBMS

Terms: Databases IBM System R - relational DBMS