In this blog post, we’ll look at using ProxySQL and VIRTUAL columns to solve ORM issues.
There are a lot of web frameworks all around. Programmers and web designers are using them to develop and deploy any website and web application. Just to cite some of the most famous names: Drupal, Ruby on Rails, Symfony, etc.
Web frameworks are very useful tools. But sometimes, as with many human artifacts, they have issues. Any framework has its own queries to manage its internal tables. While there is nothing wrong with that, but it often means these queries are not optimized.
Here is my case with Symfony 2 on MySQL 5.7, and how I solved it.
The sessions table issue
Symfony has a table to manage session data for users on the application. The table is defined as follow:
CREATE TABLE `sessions` ( `sess_id` varchar(126) COLLATE utf8_bin NOT NULL, `sess_data` blob NOT NULL, `sess_time` int(10) unsigned NOT NULL, `sess_lifetime` mediumint(9) NOT NULL, PRIMARY KEY (`sess_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
The expiration time of the user session is configurable. The developers decided to configure it to be one month.
Symfony was serving a high traffic website, and very soon that table became very big. After one month, I saw it had more than 14 million rows and was more than 3GB in size.
mysql> SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE, TABLE_ROWS, DATA_LENGTH –> FROM information_schema.tables WHERE table_schema=‘symfony’ AND table_name=‘sessions’\G *************************** 1. row *************************** TABLE_SCHEMA: symfony TABLE_NAME: sessions ENGINE: InnoDB TABLE_ROWS: 14272158 DATA_LENGTH: 3306140672 |
Developers noticed the web application sometimes stalling for a few seconds. First, I analyzed the slow queries on MySQL and I discovered that sometimes Symfony deletes inactive sessions. It issued the following query, which took several seconds to complete. This query was the cause of the stalls in the application:
DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847 |
The query is not optimized. Let’s have a look at the EXPLAIN:
mysql> EXPLAIN DELETE FROM sessions WHERE sess_lifetime + sess_time < 1521025847\G *************************** 1. row *************************** id: 1 select_type: DELETE table: sessions type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 14272312 filtered: 100.00 Extra: Using where |
Every DELETE query was a full table scan of more than 14 million rows. So, let’s try to improve it.
First workaround
Looking around on the web and discussing it with colleagues, we’ve found some workarounds. But none of them was the definitive solution:
- Reduce expiration time in Symfony configuration. Good idea. One month is probably too long for a high traffic website. But we kept the expiration time configured at one month because of an internal business policy. But even one week wouldn’t have solved the full table scan.
- Using a different database solution. Redis was proposed as an alternative to MySQL to manage session data. This might be a good solution, but it could involve a long deployment time. We planned a test, but the sysadmins suggested it was not a good solution to have another database system for such a simple task.
- Patching Symfony code. It was proposed to rewrite the query directly into the Symfony code. Discarded.
- Create indexes. It was proposed to create indexes on sess_time and sess_lifetime columns. The indexes wouldn’t get used because of the arithmetic addition on the where clause. This is the only condition we have on the query.
So, what do we do if everything must remain the same? Same configuration, same environment, same query issued and no indexes added?
Query optimization using a virtual column
I focused on how to optimize the query. Since I was using 5.7, I thought about a generated virtual column. I decided to add a virtual column in the sessions table, defined as sess_time+sess_lifetime (the same as the condition of the query):
mysql> ALTER TABLE sessions ADD COLUMN `sess_delete` INT UNSIGNED GENERATED ALWAYS AS ((`sess_time` + `sess_lifetime`)) VIRTUAL; |
Any virtual column can have an index on it. So, I created the index:
mysql> ALTER TABLE sessions ADD INDEX(sess_delete); |
Note: I first checked that the INSERT queries were well written in Symfony (with an explicit list of the fields to insert), in make sure this modification wouldn’t cause more issues. Making a schema change on a table that is in use by any framework, where the queries against the table are generally outside of your control, can be a daunting task.
So, let’s EXPLAIN the query rewritten as follows, with the condition directly on the generated indexed column:
mysql> EXPLAIN DELETE FROM sessions WHERE sess_delete < 1521025847\G *************************** 1. row *************************** id: 1 select_type: DELETE table: sessions type: range possible_keys: sess_delete key: sess_delete key_len: 5 ref: const rows: 6435 filtered: 100.00 Extra: Using where |
The query now can to use the index, and the number of rows selected are the exact number of the session that we have to delete.
So far, so good. But will Symfony execute that query if we don’t want to modify the source code?
Using ProxySQL to rewrite the query
Fortunately, we already had ProxySQL up and running in our environment. We were using it just to manage the master MySQL failover.
One of the very useful features of ProxySQL is the ability to rewrite any query it receives into another one based on rules you can define. You can create queries from very simple rules, like changing the name of a field, to very complex queries that use a chain of rules. It depends on how complex the translation is that you have to do. In our case, we just needed to translate sess_time + sess_lifetime into sess_delete. The rest of the query was the same. We needed to define a very simple rule.
Let’s see how to create the rewrite rules.
Connect to the proxy:
mysql –u admin –psecretpwd –h 127.0.0.1 –P6032 —prompt=‘Admin> ‘ |
Define the rewrite rule by inserting a record into the mysql_query_rules table:
Admin> INSERT INTO mysql_query_rules(rule_id,active,flagIN,match_pattern,negate_match_pattern,re_modifiers,replace_pattern,destination_hostgroup,apply) –> VALUES( –> 1, –> 1, –> 0, –> ‘^DELETE FROM sessions WHERE sess_lifetime + sess_time < (.*)’, –> 0, –> ‘CASELESS’, –> ‘DELETE FROM sessions WHERE sess_delete < \1’, –> 0, –> 1); |
The two fields I want to focus on are:
- match_pattern: it defines the query to be matched using the regular expression notation. The + symbol must be escaped using because it’s a special character for regular expressions
- replace_pattern: it defines how to rewrite the matched query. 1 is the value of the parameter matched by match_pattern into (.*)
For the meaning of the other fields, have a look at https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration.
Once created, we have to save the rule to disk and put it on runtime to let it run effectively.
Admin> SAVE MYSQL QUERY RULES TO DISK;
Admin> LOAD MYSQL QUERY RULES TO RUNTIME; |
After that, the proxy began to filter the query and rewrite it to have a better execution plan using the index on the virtual column.
Note: pay attention when you need to upgrade the framework. If it needs to rebuild the database tables, you will lose the virtual column you’ve created. Just remember to recreate it and check it after the upgrade.
Conclusion
Developers love using web frameworks because they are very powerful in simplifying development and deployment of complex web applications. But for DBAs, sometimes internal queries can cause a bit of a headache because it is not well optimized or because it was not supposed to run in your “huge” database. I solved my case using ProxySQL and VIRTUAL columns with a minimal impact on the architecture of the system we had and avoided any source code patching.
Take this post as a tip in case you face similar issues with your application framework.
Related
via Planet MySQL
Using ProxySQL and VIRTUAL Columns to Solve ORM Issues