What types of indexes are supported in the most popular DBMS?
B-tree
The B-Tree index family is the most commonly used type of index, organized as a balanced tree, of ordered keys. They are supported by almost all DBMS, both relational and non-relational, and for almost all types of data.
Since most probably know them well (or can read about them, for example, here), the only thing that should perhaps be noted here is that this type of index is optimal for a set with a good distribution of values and high power (cardinality -number of unique values).
Spatial indices
Currently, all DBMS data has spatial data types and functions for working with them, for Oracle it is a set of types and functions in the MDSYS schema, for PostgreSQL it is point, line, lseg, polygon, box, path, polygon, circle, in MySQL - geometry, point, linestring, polygon, multipoint, multilinestring, multipolygon, geometrycollection, MS SQL - Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection.
In the scheme of work of spatial queries, usually there are two stages or two stages of filtering. DBMS with weak spatial support work out only the first stage (coarse filtering, MySQL). As a rule, an approximate, approximate representation of objects is used at this stage. The most common type of approximation is the Minimum Bounding Rectangle (MBR) [100].
For spatial data types, there are special indexing methods based on R-trees (R-Tree index) and grids (Grid-based Spatial index).
Spatial grid
Spatial grid (spatial grid) index is a tree-like structure similar to a B-tree, but is used to provide access to spatial (Spatial) data, that is, to index multi-dimensional information, such as geographic data with two-dimensional coordinates (latitude and longitude). ). In this structure, the nodes of the tree are the cells of space. For example, for a two-dimensional space: first, the entire parent area will be divided into a grid of strictly defined resolution, then each grid cell in which the number of objects exceeds the maximum set of objects in the cell will be divided into the next level subgrid. This process will continue until the maximum nesting is reached (if set), or until everything is divided up to cells that do not exceed the maximum objects.
In the case of three-dimensional or multidimensional space, these will be rectangular parallelepipeds (cuboids) or parallelotopes.
Quadtree
Quadtree is a subspecies of the Grid-based Spatial index, in which there are always 4 descendants in the parent cell and the grid resolution varies depending on the nature or complexity of the data.
R-tree
R-Tree (Regions Tree) is also a tree-like data structure similar to the Spatial Grid, proposed in 1984 by Antonin Guttman. This data structure also divides the space into many hierarchically nested cells, but which, unlike the Spatial Grid, do not have to completely cover the parent cell and can intersect.
Various algorithms can be applied to splitting overflowed vertices, which causes the division of R-trees into subtypes: with quadratic and linear complexity (Guttman, of course, described with exponential complexity — Exhaustive Search, but naturally he is not used anywhere).
The quadratic subtype is divided into two rectangles with a minimum area covering all objects. Linear - in the division by maximum distance.
HASH
Hash indices were proposed by Arthur Fuller, and they do not preserve the values themselves, but their hashes, thereby reducing the size (and, accordingly, their processing speed) of indices from large fields. Thus, when querying using HASH indexes, the hash from the desired value will not be compared with the field hashes, but with the hashes of the fields.
Due to the non-linearity of hash functions, this index cannot be sorted by value, which leads to the impossibility of using more / less and “is null” in comparisons. In addition, since hashes are not unique, collision resolution methods are used for matching hashes.
Bitmap
Bitmap index - the bit index method consists in creating separate bitmaps (sequence 0 and 1) for each possible value of the column, where each bit corresponds to a row with an indexed value, and its value equal to 1 means that the entry corresponding to the bit position contains the indexed value for this column or property.
Empid | Floor |
---|
one | Male |
2 | Female |
3 | Female |
four | Male |
five | Female |
Bit cards
Value | Start | the end | Bit mask |
---|
Male | Address of the first line | Last line address | 10010 |
Female | Address of the first line | Last line address | 01101 |
The main advantage of bit indices is that on large sets with low power and good clustering by their values, the index will be less than B * -tree. (Read more here or here)
Reverse index
Reverse index is also a B-tree index but with a reversed key, used mainly for monotonously increasing values (for example, an auto-incrementing identifier) in OLTP systems in order to remove competition for the last leaf block of the index, since due to the reversal of the value, two adjacent index entries fall into different index blocks. It cannot be used for range search.
Example:
Field in the table (bin) | Reverse index key (bin) |
---|
00000001 | 10,000,000 |
... | ... |
00001001 | 10010000 |
00001010 | 01010000 |
00001011 | 11010000 |
As you can see, the value in the index changes much more than the value in the table itself, and therefore in the b-tree structure, they will fall into different blocks.
Inverted index
An inverted index is a full-text index that stores for each key token a sorted list of addresses of table entries that contain the given key.
one | Mom washed the frame |
2 | Dad washed the frame |
3 | Dad was washing a car |
four | Mom polished car |
In simplified form, it will look like this:
Mama | 1.4 |
Soap | one |
Rama | 1.2 |
Dad | 2.3 |
Polished | four |
Car | 3.4 |
Partial index
Partial index is an index built on the part of the table that satisfies a specific condition of the index itself. This index was created to reduce the size of the index.
Function-based index
The very flexible type of index is a functional index, that is, an index whose keys store the result of user-defined functions. Functional indexes are often built for fields whose values are pre-processed before comparison in a SQL command. For example, when comparing string data case-insensitive, the UPPER function is often used. Creating a functional index with the UPPER function improves the efficiency of such comparisons.
In addition, a functional index can help to implement any other missing index type of this DBMS (except, perhaps, a bit index, for example, Hash for Oracle)
Summary Table of Index Types
| Mysql | PostgreSQL | MS SQL | Oracle |
B-Tree index | there is | there is | there is | there is |
Spatial indexes supported | R-Tree with quadratic splitting | Rtree_GiST (linear split is used) | 4-level Grid-based spatial index (separate for geographic and geodetic data) | R-Tree with quadratic partitioning; Quadtree |
Hash index | Only in tables of type Memory | there is | Not | Not |
Bitmap index | Not | there is | Not | there is |
Reverse index | Not | Not | Not | there is |
Inverted index | there is | there is | there is | there is |
Partial index | Not | there is | there is | Not |
Function based index | Not | there is | there is | there is |
It is worth mentioning that in PostgreSQL GiST allows you to create an index based on R-Tree for any of your own data types. To do this, you need to implement all 7 functions of the R-Tree mechanism.
Comments
To leave a comment
Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL
Terms: Databases, knowledge and data warehousing. Big data, DBMS and SQL and noSQL