More and more people are using UUID’s to identify records in their database.
As you already know, for MySQL’s storage engine (InnoDB) the primary key is very important ! (for performance, memory and disk space).
See the following links:
- https://lefred.be/content/mysql-invisible-column-part-ii/
- https://dev.mysql.com/doc/refman/8.0/en/create-table-gipks.html (GIPK mode in MySQL 8.0.30 !)
Problems
There are 2 major problems having a UUID as Primary Key in InnoDB:
- generally they are random and cause clustered index to be rebalanced
- they are included in each secondary indexes (consuming disk and memory)
Let’s have a look at this example:
MySQL > CREATE TABLE my_table (
uuid VARCHAR(36) DEFAULT (UUID()) PRIMARY KEY,
name VARCHAR(20), beers int unsigned);
...
MySQL > SELECT * FROM my_table;
+--------------------------------------+---------+-------+
| uuid | name | beers |
+--------------------------------------+---------+-------+
| 17cd1188-1fa0-11ed-ba36-c8cb9e32df8e | Kenny | 0 |
| 17cd12e2-1fa0-11ed-ba36-c8cb9e32df8e | lefred | 1 |
| 478368a0-1fa0-11ed-ba36-c8cb9e32df8e | Scott | 1 |
| 47836a77-1fa0-11ed-ba36-c8cb9e32df8e | Lenka | 0 |
+--------------------------------------+---------+-------+
Now, let’s insert 2 new records:
MySQL > INSERT INTO my_table (name, beers) VALUES ("Luis",1), ("Miguel",5);
We can check the content of the table:
MySQL > SELECT * FROM my_table;
+--------------------------------------+---------+-------+
| uuid | name | beers |
+--------------------------------------+---------+-------+
| 17cd1188-1fa0-11ed-ba36-c8cb9e32df8e | Yannis | 0 |
| 17cd12e2-1fa0-11ed-ba36-c8cb9e32df8e | lefred | 1 |
| 36f1ce9a-1fa1-11ed-ba36-c8cb9e32df8e | Luis | 1 | <--
| 36f1d158-1fa1-11ed-ba36-c8cb9e32df8e | Miguel | 5 | <--
| 478368a0-1fa0-11ed-ba36-c8cb9e32df8e | Scott | 1 |
| 47836a77-1fa0-11ed-ba36-c8cb9e32df8e | Lenka | 0 |
+--------------------------------------+---------+-------+
We can see that the two new records were not inserted at the end of the table but in the middle. InnoDB had to move two old records to be able to insert the two new before them. On such small table (all records are on the same page) that doesn’t cause any problem, but imagine this table is 1TB large !
Additionally, if we keep the VARCHCAR datatype for our uuid, the primary key could take 146 bytes per row (some utf8 characters can take up to 4 bytes + the 2 bytes marking the end of the VARCHAR):
MySQL > EXPLAIN SELECT * FROM my_table WHERE
uuid='36f1d158-1fa1-11ed-ba36-c8cb9e32df8e'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: my_table
partitions: NULL
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 146 <--
ref: const
rows: 1
filtered: 100
Extra: NULL
Solutions
Of course there are some best practices that MySQL users can follow to avoid those problems:
- use a smaller datatype to store their UUIDs:
BINARY(16)
- store the UUIDs sequentially: use
UUID_TO_BIN(..., swap_flag)
- The time-low and time-high parts (the first and third groups of hexadecimal digits, respectively) are swapped.
Let’s see this in action with the following example:
MySQL > CREATE TABLE my_table2 (
uuid BINARY(16) DEFAULT (UUID_TO_BIN(UUID(), 1)) PRIMARY KEY,
name VARCHAR(20), beers int unsigned);
MySQL > SELECT * FROM my_table2;
+------------------------------------+--------+-------+
| uuid | name | beers |
+------------------------------------+--------+-------+
| 0x11ED1F9F633ECB6CBA36C8CB9E32DF8E | Kenny | 0 |
| 0x11ED1F9F633ECD6FBA36C8CB9E32DF8E | lefred | 1 |
+------------------------------------+--------+-------+
As the UUID is now binary, we need to decode it using the function BIN_TO_UUID()
and not forget the swap flag:
MySQL > SELECT BIN_TO_UUID(uuid,1), name, beers FROM my_table2;
+--------------------------------------+--------+-------+
| BIN_TO_UUID(uuid,1) | name | beers |
+--------------------------------------+--------+-------+
| 633ecb6c-1f9f-11ed-ba36-c8cb9e32df8e | Kenny | 0 |
| 633ecd6f-1f9f-11ed-ba36-c8cb9e32df8e | lefred | 1 |
+--------------------------------------+--------+-------+
And now we can verify that when we add new entries they are added to the end of the table:
MySQL > INSERT INTO my_table2 (name, beers) VALUES ("Scott",1), ("Lenka",5);
MySQL > SELECT * FROM my_table2;
+------------------------------------+---------+-------+
| uuid | name | beers |
+------------------------------------+---------+-------+
| 0x11ED1F9F633ECB6CBA36C8CB9E32DF8E | Kenny | 0 |
| 0x11ED1F9F633ECD6FBA36C8CB9E32DF8E | lefred | 1 |
| 0x11ED1FA537C57361BA36C8CB9E32DF8E | Scott | 1 | <--
| 0x11ED1FA537C5752DBA36C8CB9E32DF8E | Lenka | 5 | <--
+------------------------------------+---------+-------+
and we can of course decode the UUID and see that without the swap flag, InnoDB would have to rebalance the clustered index:
MySQL > SELECT BIN_TO_UUID(uuid,1), name, beers FROM my_table2;
+--------------------------------------+---------+-------+
| BIN_TO_UUID(uuid,1) | name | beers |
+--------------------------------------+---------+-------+
| 633ecb6c-1f9f-11ed-ba36-c8cb9e32df8e | Kenny | 0 |
| 633ecd6f-1f9f-11ed-ba36-c8cb9e32df8e | lefred | 1 |
| 37c57361-1fa5-11ed-ba36-c8cb9e32df8e | Scott | 1 | <--
| 37c5752d-1fa5-11ed-ba36-c8cb9e32df8e | Lenka | 5 | <--
+--------------------------------------+---------+-------+
And of course, now the size of the primary key is smaller and fixed to 16 bytes. Only those 16 bytes are added to all secondary indexes:
MySQL > EXPLAIN SELECT * FROM my_table2
WHERE uuid=UUID_TO_BIN("37c5752d-1fa5-11ed-ba36-c8cb9e32df8e",1)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: my_table2
partitions: NULL
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 16 <---
ref: const
rows: 1
filtered: 100
Extra: NULL
UUID v1
MySQL generates UUID v1 as described in RFC4122.
- UUID v1 : is a universally unique identifier that is generated using a timestamp and the MAC address of the computer on which it was generated.
- UUID v4 : is a universally unique identifier that is generated using random numbers.
With UUID v4m it’s not possible to generate any sequential ouput and this is why those random UUID should never be used as Primary Key with InnoDB.
UUID v4
Some developers keep asking about UUIDv4 and how to generate them for MySQL. Browsing the Internet, you can find several store procedures trying to achieve this.
This one, found on StackOverflow, is maybe my favorite:
CREATE FUNCTION uuid_v4s()
RETURNS CHAR(36)
BEGIN
-- 1th and 2nd block are made of 6 random bytes
SET @h1 = HEX(RANDOM_BYTES(4));
SET @h2 = HEX(RANDOM_BYTES(2));
-- 3th block will start with a 4 indicating the version, remaining is random
SET @h3 = SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3);
-- 4th block first nibble can only be 8, 9 A or B, remaining is random
SET @h4 = CONCAT(HEX(FLOOR(ASCII(RANDOM_BYTES(1)) / 64)+8),
SUBSTR(HEX(RANDOM_BYTES(2)), 2, 3));
-- 5th block is made of 6 random bytes
SET @h5 = HEX(RANDOM_BYTES(6));
-- Build the complete UUID
RETURN LOWER(CONCAT(
@h1, '-', @h2, '-4', @h3, '-', @h4, '-', @h5
));
END
Unfortunately this function cannot be used as default expression for a column.
I also wrote a component using boost’s uuid library: https://github.com/lefred/mysql-component-uuid_v4
But this new function is also not usable as default value expression.
MySQL error code MY-003770 (ER_DEFAULT_VAL_GENERATED_NAMED_FUNCTION_IS_NOT_ALLOWED): Default value expression of column '%s' contains a disallowed function: %s.
This means that every new record needs to provide the uuid column… this is not too complicated anyway.
Let’s see an example:
MySQL > install component "file://component_uuid_v4";
MySQL > select uuid_v4() ;
+--------------------------------------+
| uuid_v4() |
+--------------------------------------+
| 9944272b-e3f9-4778-9c54-818f0baa87da |
+--------------------------------------+
1 row in set (0.0002 sec)
Now we will create a new table, but as recommended, we won’t use the uuid as Primary Key ! We will use a new feature of MySQL 8.0.30: GIPK Mode !
GIPK stands for Generated Invisible Primary Key, check the manual for more info.
MySQL > SET sql_generate_invisible_primary_key=1;
MySQL > CREATE TABLE my_table3 (
uuid BINARY(16) NOT NULL UNIQUE,
name VARCHAR(20), beers INT UNSIGNED);
MySQL > SHOW CREATE TABLE my_table3\G
*************************** 1. row ***************************
Table: my_table3
Create Table: CREATE TABLE `my_table3` (
`my_row_id` bigint unsigned NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
`uuid` binary(16) NOT NULL,
`name` varchar(20) DEFAULT NULL,
`beers` int unsigned DEFAULT NULL,
PRIMARY KEY (`my_row_id`),
UNIQUE KEY `uuid` (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Now let’s insert some records and see if they are inserted sequentially and if the UUID’s value is completely random:
MySQL > INSERT INTO my_table3 (uuid, name, beers)
VALUES (UUID_TO_BIN(uuid_v4()),'Kenny', 3),
(UUID_TO_BIN(uuid_v4()), 'lefred', 1);
MySQL > SELECT * FROM my_table3;
+------------------------------------+--------+-------+
| uuid | name | beers |
+------------------------------------+--------+-------+
| 0x5A28E5482CDF4B3D89A298ECA3F3703B | Kenny | 3 |
| 0x94662BF4DC2F469489D868820B7B31E5 | lefred | 1 |
+------------------------------------+--------+-------+
MySQL > SELECT BIN_TO_UUID(uuid), name, beers FROM my_table3;
+--------------------------------------+--------+-------+
| bin_to_uuid(uuid) | name | beers |
+--------------------------------------+--------+-------+
| 5a28e548-2cdf-4b3d-89a2-98eca3f3703b | Kenny | 3 |
| 94662bf4-dc2f-4694-89d8-68820b7b31e5 | lefred | 1 |
+--------------------------------------+--------+-------+
So far, so good.. let’s add some more records:
MySQL > INSERT INTO my_table3 (uuid, name, beers)
VALUES (UUID_TO_BIN(uuid_v4()),'Scott', 10),
(UUID_TO_BIN(uuid_v4()), 'Lenka', 0);
MySQL > SELECT BIN_TO_UUID(uuid), name, beers FROM my_table3;
+--------------------------------------+--------+-------+
| bin_to_uuid(uuid) | name | beers |
+--------------------------------------+--------+-------+
| 5a28e548-2cdf-4b3d-89a2-98eca3f3703b | Kenny | 3 |
| 94662bf4-dc2f-4694-89d8-68820b7b31e5 | lefred | 1 |
| 615fae32-d6c8-439c-9520-5d3c8bfa934b | Scott | 10 |
| 80a01a29-489b-419d-bca1-05a756ad9d9d | Lenka | 0 |
+--------------------------------------+--------+-------+
We can see that indeed, the UUIDs are completely random and sequentially added to the table. The reason of that optimal sequential insertion is that the invisible Primary Key is an auto_increment.
It’s possible to also display it on demand:
MySQL > SELECT my_row_id, BIN_TO_UUID(uuid), name, beers FROM my_table3;
+-----------+--------------------------------------+--------+-------+
| my_row_id | bin_to_uuid(uuid) | name | beers |
+-----------+--------------------------------------+--------+-------+
| 1 | 5a28e548-2cdf-4b3d-89a2-98eca3f3703b | Kenny | 3 |
| 2 | 94662bf4-dc2f-4694-89d8-68820b7b31e5 | lefred | 1 |
| 3 | 615fae32-d6c8-439c-9520-5d3c8bfa934b | Scott | 10 |
| 4 | 80a01a29-489b-419d-bca1-05a756ad9d9d | Lenka | 0 |
+-----------+--------------------------------------+--------+-------+
Conclusion
In summary, if you want to use UUID’s in MySQL, it’s recommended to use UUID v1, those generated by MySQL, and store them as binary using the swap flag.
If for some reason you need UUID v4, it is recommended to let MySQL and InnoDB handle the primary key by enabling GIPK mode.
Enjoy MySQL !
Planet MySQL