In this blog post, I’ll look at a trick we use at Percona when upgrading to MySQL 5.7.
I’ll be covering this subject (and others) in my webinar Learning MySQL 5.7 on Wednesday, July 19, 2017.
We’ve been doing upgrades for quite a while here are Percona, and we try to optimize, standardize and improve this process to save time. When upgrading to MySQL 5.7, more often than not you need to run REPAIR or ALTER via mysql_upgrade to a number of MySQL tables. Sometimes a few hundred, sometimes hundreds of thousands.
One way to cut some time from testing or executing mysql_upgrade is to combine it with mysqlcheck. This identifies tables that need to be rebuilt or repaired. The first step is to capture the output of this process:
|
revin@acme:~$ mysqlcheck —check–upgrade —all–databases > mysql–check.log
|
This provides a lengthy output of what needs to be done to successfully upgrade our tables. On my test data, I get error reports like the ones below. I’ll need to take the specified action against them:
|
ads.agency
error : Table upgrade required. Please do “REPAIR TABLE `agency`” or dump/reload to fix it!
store.categories
error : Table rebuild required. Please do “ALTER TABLE `categories` FORCE” or dump/reload to fix it!
|
Before we run through this upgrade, let’s get an idea of how long it would take for a regular mysql_upgrade to complete on this dataset:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
revin@acme:~$ time mysql_upgrade
Enter password:
Checking if update is needed.
Checking server version.
Running queries to upgrade MySQL server.
Checking system database.
mysql.columns_priv OK
mysql.db OK
...
mysql.user OK
Upgrading the sys schema.
Checking databases.
ads.account_preference_assoc OK
...
Repairing tables
...
ads.agency
Note : TIME/TIMESTAMP/DATETIME columns of old format have been upgraded to the new format.
status : OK
...
`store`.`categories`
Running : ALTER TABLE `store`.`categories` FORCE
status : OK
...
Upgrade process completed successfully.
Checking if update is needed.
real 25m57.482s
user 0m0.024s
sys 0m0.072s
|
On a cold server, my baseline above took about 25 minutes.
The second step on our time-saving process is to identify the tables that need some action (in this case, REPAIR and ALTER … FORCE). Generate the SQL statements to run them and put them into a single SQL file:
|
revin@acme:~$ for t in $(cat mysql–check.log |grep –B1 REPAIR | egrep –v ‘REPAIR|–‘);
do echo “mysql -e ‘REPAIR TABLE $t;'” >> upgrade.sql; done
revin@acme:~$ for t in $(cat mysql–check.log |grep –B1 ALTER | egrep –v ‘ALTER|–‘);
do echo “mysql -e ‘ALTER TABLE $t FORCE;'” >> upgrade.sql; done
|
My upgrade.sql file will have something like this:
|
mysql –e ‘ALTER TABLE store.categories FORCE;’
mysql –e ‘REPAIR TABLE ads.agency;’
|
Now we should be ready to run these commands in parallel as the third step in the process:
|
revin@acme:~$ time parallel –j 4 — < upgrade.sql
...
real 17m31.448s
user 0m1.388s
sys 0m0.616s
|
Getting some parallelization is not bad, and the process improved by about 38%. If we are talking about multi-terabyte data sets, then it is already a big gain.
On the other hand, my dataset has a few tables that are bigger than the rest. Since mysqlcheck processes them in a specific order, one of the threads was processing most of them instead of spreading them out evenly to each thread by size. To fix this, we need to have an idea of the sizes of each table we will be processing. We can use a query from the INFORMATION_SCHEMA.TABLES for this purpose:
|
revin@acme:~$ for t in $(cat mysql–check.log |grep –B1 ALTER | egrep –v ‘ALTER|–‘);
do d=$(echo $t|cut –d‘.’ –f1); tbl=$(echo $t|cut –d‘.’ –f2);
s=$(mysql –BNe “select sum(index_length+data_length) from information_schema.tables where table_schema=’$d’ and table_name=’$tbl’;”);
echo “$s |mysql -e ‘ALTER TABLE $t FORCE;'” >> table–sizes.sql; done
revin@acme:~$ for t in $(cat mysql–check.log |grep –B1 REPAIR | egrep –v ‘REPAIR|–‘);
do d=$(echo $t|cut –d‘.’ –f1); tbl=$(echo $t|cut –d‘.’ –f2);
s=$(mysql –BNe “select sum(index_length+data_length) from information_schema.tables where table_schema=’$d’ and table_name=’$tbl’;”);
echo “$s |mysql -e ‘REPAIR TABLE $t;'” >> table–sizes.sql; done
|
Now my table-sizes.sql file will have contents like below, which I can sort and pass to the parallel command again and cut even more time!
|
32768 |mysql –e ‘REPAIR TABLE ads.agency;’
81920 |mysql –e ‘ALTER TABLE store.categories FORCE;’
|
|
revin@acme:~$ cat table–sizes.sql |sort –rn|cut –d‘|’ –f2 > upgrade.sql
revin@acme:~$ time parallel –j 4 — < upgrade.sql
...
real 8m1.116s
user 0m1.260s
sys 0m0.624s
|
This go-around, my total execution time is 8 minutes – a good 65% improvement. To wrap it up, we will need to run mysql_upgrade one last time so that the system tables are also upgraded, the tables are checked again and then restart the MySQL server as instructed by the manual:
|
revin@acme:~$ time mysql_upgrade —force
|
The whole process should be easy to automate and script, depending on your preference. Lastly: YMMV. If you have one table that is more than half the size of your total data set, there might not be big gains.
If you want to learn more about upgrading to MySQL 5.7, come to my webinar on Wednesday, July 19: Learning MySQL 5.7. This process is only one of the phases in a multi-step upgrade process when moving to 5.7. I will discuss them in more detail next week. Register now from the link below, and I’ll talk to you soon!
Related
Jervin Real
As Senior Consultant, Jervin partners with Percona’s customers on building reliable and highly performant MySQL infrastructures while also doing other fun stuff like watching cat videos on the internet. Jervin joined Percona in Apr 2010.