https://www.percona.com/blog/wp-content/uploads/2022/01/MySQL-8.0-Functional-Indexes.png
Working with hundreds of different customers I often face similar problems around running queries. One very common problem when trying to optimize a database environment is index usage. A query that cannot use an index is usually a long-running one, consuming more memory or triggering more disk iops.
A very common case is when a query uses a filter condition against a column that is involved in some kind of functional expression. An index on that column can not be used.
Starting from MySQL 8.0.13 functional indexes are supported. In this article, I’m going to show what they are and how they work.
The Well-Known Problem
As already mentioned, a very common problem about index usage is when you have a filter condition against one or more columns involved in some kind of functional expression.
Let’s see a simple example.
You have a table called products containing the details of your products, including a create_time TIMESTAMP column. If you would like to calculate the average price of your products on a specific month you could do the following:
mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+
The query returns the right value, but take a look at the EXPLAIN:
mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 99015 filtered: 100.00 Extra: Using where
The query triggers a full scan of the table. Let’s create an index on create_time and check again:
mysql> ALTER TABLE products ADD INDEX(create_time); Query OK, 0 rows affected (0.71 sec) Records: 0 Duplicates: 0 Warnings: 0
mysql> explain SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 99015 filtered: 100.00 Extra: Using where
A full scan again. The index we have created is not effective. Indeed any time an indexed column is involved in a function the index can not be used.
To optimize the query the workaround is rewriting it differently in order to isolate the indexed column from the function.
Let’s test the following equivalent query:
mysql> SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01'; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+
mysql> EXPLAIN SELECT AVG(price) FROM products WHERE create_time BETWEEN '2019-10-01' AND '2019-11-01'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: range possible_keys: create_time key: create_time key_len: 5 ref: NULL rows: 182 filtered: 100.00 Extra: Using index condition
Cool, now the index is used. Then rewriting the query was the typical suggestion.
Quite a simple solution, but not all the times it was possible to change the application code for many valid reasons. So, what to do then?
MySQL 8.0 Functional Indexes
Starting from version 8.0.13, MySQL supports functional indexes. Instead of indexing a simple column, you can create the index on the result of any function applied to a column or multiple columns.
Long story short, now you can do the following:
mysql> ALTER TABLE products ADD INDEX((MONTH(create_time))); Query OK, 0 rows affected (0.74 sec) Records: 0 Duplicates: 0 Warnings: 0
Be aware of the double parentheses. The syntax is correct since the expression must be enclosed within parentheses to distinguish it from columns or column prefixes.
Indeed the following returns an error:
mysql> ALTER TABLE products ADD INDEX(MONTH(create_time)); ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'create_time))' at line 1
Let’s check now our original query and see what happens to the EXPLAIN
mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+
mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ref possible_keys: functional_index key: functional_index key_len: 5 ref: const rows: 182 filtered: 100.00 Extra: NULL
The query is no longer a full scan and runs faster. The functional_index has been used, with only 182 rows examined. Awesome.
Thanks to the functional index we are no longer forced to rewrite the query.
Which Functional Indexes are Permitted
We have seen an example involving a simple function applied to a column, but you are granted to create more complex indexes.
A functional index may contain any kind of expressions, not only a single function. The following patterns are valid functional indexes:
INDEX( ( col1 + col2 ) )
INDEX( ( FUNC(col1) + col2 – col3 ) )
You can use ASC or DESC as well:
INDEX( ( MONTH(col1) ) DESC )
You can have multiple functional parts, each one included in parentheses:
INDEX( ( col1 + col2 ), ( FUNC(col2) ) )
You can mix functional with nonfunctional parts:
INDEX( (FUNC(col1)), col2, (col2 + col3), col4 )
There are also limitations you should be aware of:
- A functional key can not contain a single column. The following is not permitted:
INDEX( (col1), (col2) ) - The primary key can not include a functional key part
- The foreign key can not include a functional key part
- SPATIAL and FULLTEXT indexes can not include functional key parts
- A functional key part can not refer to a column prefix
At last, remember that the functional index is useful only to optimize the query that uses the exact same expression. An index created with nonfunctional parts can be used instead to solve multiple different queries.
For example, the following conditions can not rely on the functional index we have created:
WHERE YEAR(create_time) = 2019
WHERE create_time > ‘2019-10-01’
WHERE create_time BETWEEN ‘2019-10-01’ AND ‘2019-11-01’
WHERE MONTH(create_time+INTERVAL 1 YEAR)
All these will trigger a full scan.
Functional Index Internal
The functional indexes are implemented as hidden virtual generated columns. For this reason, you can emulate the same behavior even on MySQL 5.7 by explicitly creating the virtual column. We can test this, starting by dropping the indexes we have created so far.
mysql> SHOW CREATE TABLE products\G *************************** 1. row *************************** Table: products Create Table: CREATE TABLE `products` ( `id` int unsigned NOT NULL AUTO_INCREMENT, `description` longtext, `price` decimal(8,2) DEFAULT NULL, `create_time` timestamp NULL DEFAULT NULL, PRIMARY KEY (`id`), KEY `create_time` (`create_time`), KEY `functional_index` ((month(`create_time`))) ) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
mysql> ALTER TABLE products DROP INDEX `create_time`, DROP INDEX `functional_index`; Query OK, 0 rows affected (0.03 sec)
We can try now to create the virtual generated column:
mysql> ALTER TABLE products ADD COLUMN create_month TINYINT GENERATED ALWAYS AS (MONTH(create_time)) VIRTUAL; Query OK, 0 rows affected (0.04 sec)
Create the index on the virtual column:
mysql> ALTER TABLE products ADD INDEX(create_month); Query OK, 0 rows affected (0.55 sec)
mysql> SHOW CREATE TABLE products\G *************************** 1. row *************************** Table: products Create Table: CREATE TABLE `products` ( `id` int unsigned NOT NULL AUTO_INCREMENT, `description` longtext, `price` decimal(8,2) DEFAULT NULL, `create_time` timestamp NULL DEFAULT NULL, `create_month` tinyint GENERATED ALWAYS AS (month(`create_time`)) VIRTUAL, PRIMARY KEY (`id`), KEY `create_month` (`create_month`) ) ENGINE=InnoDB AUTO_INCREMENT=149960 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
We can now try our original query. We expect to see the same behavior as the functional index.
mysql> SELECT AVG(price) FROM products WHERE MONTH(create_time)=10; +------------+ | AVG(price) | +------------+ | 202.982582 | +------------+
mysql> EXPLAIN SELECT AVG(price) FROM products WHERE MONTH(create_time)=10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: products partitions: NULL type: ref possible_keys: create_month key: create_month key_len: 2 ref: const rows: 182 filtered: 100.00 Extra: NULL
Indeed, the behavior is the same. The index on the virtual column can be used and the query is optimized.
The good news is that you can use this workaround to emulate a functional index even on 5.7, getting the same benefits. The advantage of MySQL 8.0 is that it is completely transparent, no need to create the virtual column.
Since the functional index is implemented as a hidden virtual column, there is no additional space needed for the data, only the index space will be added to the table.
By the way, this is the same technique used for creating indexes on JSON documents’ fields.
Conclusion
The functional index support is an interesting improvement you can find in MySQL 8.0. Some of the queries that required rewriting to get optimized don’t require that anymore. Just remember that only the queries having the same filter pattern can rely on the functional index. Then you need to create additional indexes or other functional indexes to improve other search patterns.
The same feature can be implemented on MySQL 5.7 with the explicit creation of a virtual generated column and the index.
For more detailed information, read the following page:
https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-functional-key-parts
Planet MySQL