Lecture
Starting with version 5.1, mySQL supports horizontal partitioning of tables. What it is? Partitioning is partitioning large tables into logical parts according to selected criteria. At the lower level for myISAM tables, these are physically different files, 3 for each partition (table description, index file, data file). For innoDB tables in the default configuration, there are different table spaces in the innoDB files (do not forget that innoDB allows you to customize individual storages at the database level or even specific tables).
What does this look like?
CREATE TABLE orders_range (
customer_surname VARCHAR(30),
store_id INT,
salesperson_id INT,
order_date DATE,
note VARCHAR(500)
) ENGINE = MYISAM
PARTITION BY RANGE( YEAR(order_date) ) (
PARTITION p_old VALUES LESS THAN(2008),
PARTITION p_2008 VALUES LESS THAN(2009),
PARTITION p_2009 VALUES LESS THAN(MAXVALUE)
);
What do we get? The first “table” will store data for the “archival” period, until 2008, the second - for 2008, and the “third” - everything else.
The most delicious thing is that you do not need to rewrite / optimize requests at all:
select * from orders_range where order_date='2009-08-01';
And this is what happens:
mysql> explain partitions select * from orders_range3 where order_date='2008-08-01';
+----+-------------+---------------+------------+--------+---------------+------+---------+------+------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------------+--------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | orders_range3 | p_2008 | system | NULL | NULL | NULL | NULL | 1 | |
+----+-------------+---------------+------------+--------+---------------+------+---------+------+------+-------+
1 row in set (0.00 sec)
We see that when this query is executed, the work will go exclusively with the sub-table p_2008.
Moreover, acceleration is achieved even in the case of executing queries that affect all data in all partitions - in this case, first, the tables are initially “processed” by fewer, then the data is combined and the final calculations are performed. So just the “first” stages, in this case will be much faster.
What other benefits are there?
I would call the main advantage the fact that the partition with the “operational” data (that is, the last for which the sampling occurs most often) has a minimum size, and as a result, they can reside in the RAM.
If you have a log table, which is continuously recorded and the hard drives do not have time, and religion does not allow you to set up a raid, you can set up partitioning by hash function, and specify one partition per hard drive available to you. In this case, the new data will be written evenly on all hard drives.
What are the ways to "separate" data provides mySQL?
1. RANGE
By range of values
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (20),
PARTITION p3 VALUES LESS THAN (30)
);
2. LIST
According to the exact list of values
PARTITION BY LIST (store_id) (
PARTITION pNorth VALUES IN (3,5,6,9,17),
PARTITION pEast VALUES IN (1,2,10,11,19,20)
)
Why, you ask? It is necessary to split into partitions either on the basis of considerations of sample optimization (which is more often) or on the basis of considerations of record optimization (less often). Accordingly, the ideal option is when you split the table into as many partitions as possible so that 90% of all samples occur within the same partition. And if you have a complicated sampling logic (for example, objects located in the northern quarters of the city, whose IDs are disparate), then sometimes it makes sense to list them forcibly.
3. HASH
PARTITION BY HASH (store_id)
PARTITIONS 4;
You do not control the partition, just indicate what field to build the hash and how many "sub-tables" to create. What for? Sampling on the specified field is much faster. In some cases, it allows to achieve "uniform spread" and accelerate data recording.
4. KEY
Almost the same as HASH, but more logically - by key.
PARTITION BY KEY (s1)
PARTITIONS 10;
Those. selection on the specified key field is as efficient as possible.
But here you should also decide on the method of partitioning. Well suited for the visitor counter, when his login is the only identifier for which it is necessary to select all other data.
What is not?
No vertical partitioning. This is when different columns (fields) are in different "sub-tables". Since sometimes it is useful, you can achieve it yourself, even if not so transparent: divide the table into two, connecting them by the primary key. If you really want beauty - you can additionally create a VIEW for them, for example, in order not to overwrite the old parts of the code.
Why do this? For example, in a table where you basically have numbers and dates, there is one VARCHAR (255) comment field, which is used less often than the other fields. If it is moved to another table, we will get a fixed row size (mySQL will be able to accurately calculate the position of the desired row by index in the data file). The table will become more resilient to failures in case of emergency situations (again, due to the fixed size of the string). Well, the size of the table itself will significantly decrease.
And ending the article I will give an example of a more “real” table partitioning - monthly. Since LIST / RANGE accept only integer values, you need to skew a bit:
PARTITION BY RANGE (TO_DAYS (order_date)) (
PARTITION y2009m1 VALUES LESS THAN (TO_DAYS ('2009-02-01')),
PARTITION y2009m2 VALUES LESS THAN (TO_DAYS ('2009-03-01')),
PARTITION y2009m3 VALUES LESS THAN (TO_DAYS ('2009-04-01'))
);
PS: In mysql, you always have to “fake a little”, so there will never be a bore with it, and we, in turn, will never be without work :)
Comments
To leave a comment
MySql (Maria DB)
Terms: MySql (Maria DB)