MySQL and UUIDs

In ALTER TABLE for UUID we discuss currently proper way to store and handle UUID in MySQL. Currently it works, even in a performant way, but it still hurts. It should not.

Definition of UUID

The RFC 4122 defines various types of UUID, and how they are being formatted for presentation and as a bit field on the wire. As this document was written bei Leach and Salz, among others, RFC 4122 UUIDs are also called “Leach-Salz UUIDs” (for example in the Java Documentation).

There are other UUID variants, used in other systems (NCS, and Microsoft “backwards compatibility”).

A RFC 4122 UUID is a special 128 bit long value (“16 octets”). It is supposed to be laid out like this:

 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_low |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| time_mid | time_hi_and_version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|clk_seq_hi_res | clk_seq_low | node (0-1) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| node (2-5) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The high bits in clk_seq_hi_res define the variant, and anything starting with the bit sequence 10 is a RFC 4122 compliant UUID.

The high nibble in time_hi_and_version defines the RFC 4122 Type (or version), the algorithm used. RFC 4122 itself defines versions 1 to 5 (version 0 is unused).

There is an expired draft that defines a version 6, which is specifically designed to use a primary key for databases. It uses the same timestamp value as Type 1, but stores the bytes so that ordering by creation time is preserved. It also relaxes the requirements on the node field, officially allowing different sources for bits than the MAC address. Sample implementations are available in various languages.

UUIDs are all over the place in Windows and also in Java. Databases have to handle them as “application generated primary keys”. That’s why they are important and should be treated with some attention.

Handling UUID in MySQL 8

MySQL provides functions to generate UUIDs, to check if a thing is a validly formatted string that looks like a UUID, and to convert UUIDs to 128 bit bitstrings and back.

The latter two functions have a special flag to improve performance with InnoDB. This flag is by default off. No attempt is made to automatically do the right thing dependent on the UUID type.

UUID function

The MySQL function UUID() returns a RFC 4122 Type 1 UUID.

mysql> select uuid_to_bin(uuid()) as uuid;
+------------------------------------+
| uuid                               |
+------------------------------------+
| 0x80462D3C96AB11EB94BBBA2278F258E1 |
+------------------------------------+
1 row in set (0.00 sec)

Formatted like in the RFC, we get the following values:

80462D3C
96AB 1 1EB
9 4 BB B A22
78F258E1

Specifically, the clk_seq_hi_res contains the variant value, 9, and the time_hi_and_version contains the version, 1.

According to the RFC, the variant 9 = 1001 = 10x (a RFC 4122 compliant UUID).

Of the Versions defined in RFC 4122, it is Version 1 or Type 1, “The time-based version in this document”. The generation algorithm is defined in section 4.2 of that RFC.

So a MySQL generated UUID is always a RFC 4122 Type 1 (time based) UUID.

Java generated UUID values

Compare with the Java UUID Class:

We have randomUUID(), returning a Type 4 UUID and nameUUIDFromBytes() for Type 3 and a generic Constructor that takes any 128 bit value. So what we get from Java is likely name based or random UUIDs generated by the application.

Name based UUIDs are fixed for a fixed name, so they pose no problem when used with MySQL as a primary key.

Random UUIDs are completely random. There is nothing to optimize for MySQL here.

Neither is a good choice to use as a kind of application controlled distribiuted primary key.

Based on the reasoning in ALTER TABLE for UUID, Java developers that are using UUID for this purpose would be well advised to implement and use a Type 1 UUID. It seems that such an implementation is not available by default as part of java.lang.util, but other implementations exist.

IS_UUID() function

The IS_UUID() predicate returns 1 for all strings that can be parsed as a UUID.

That is, it checks for one of three ways to write the UUID, and that it contains only valid hex digits in the places that actually contain data. No attempt is made to validate any of the bits. Code is here, and the actual parsing happens here.

UUID_TO_BIN() function

MySQL allows three ways to write UUIDs, as a 128 bit hex string (6ccd780cbaba102695645b8c656024db), as a 128 bit hex string with dashes in the right places (6CCD780C-BABA-1026-9564-5B8C656024DB) and as a 128 bit hex string with dashes in the right places and enclosed in curly braces ({6CCD780C-BABA-1026-9564-5B8C656024DB}).

This is not a particular dense packing of data for storage. The UUID_TO_BIN() function takes any of these strings and returns a VARBINARY(16) for storage.

The function has an optional second parameter, the swap_flag. When applied to a Type 1 UUID, the time bytes are being swapped around so that chronological ascending UUIDs from the same node are also having numerically ascending values. This optimizes storage with InnoDB.

  • Type 1 UUID: It is recommended you define UUID columns as VARBINARY(16) and use theUUID_TO_BIN(uuid_string, 1) function to store data.
  • Type 6 UUID: You should use UUID_TO_BIN(uuid_string, 0) to store Type 6 UUIDs, because Type 6 has been specifically created to avoid the swapping of time bytes around.
  • Other types: These do not profit from swapping, so also use UUID_TO_BIN(uuid_string, 0).

BIN_TO_UUID function

The inverse function to UUID_TO_BIN is BIN_TO_UUID(). It needs the same swap_flag value as has been used at write time in order to unpack the data correctly.

It should hurt less

This all should hurt less.

  • MySQL should have a native UUID column type, which stores data internally in a VARBINARY(16) (or BINARY(16)).
  • It should accept all three notations as input (hex string, with dashes, with curly braces and dashes). It should return a hex string without dashes or curlies.
  • It should validate variant and type (version), allowing Types 1 to 6.
  • It should auto-convert (swap) data for Type 1, but not for any other type.
  • There should be a formatting function that turns a hex string without dashes into a UUID string with dashes, or a UUID string with dashes and curlies.

That is, I want to be able to

mysql> create table u ( u uuid not null primary key, dummy integer );
mysql> insert into u values (uuid(), 1);
mysql> select u from u\G
u: 6ccd780cbaba102695645b8c656024db
mysql> select format_uuid(u, 0) as u from u\G
u: 6CCD780C-BABA-1026-9564-5B8C656024DB
mysql> select format_uuid(u) as u from u\G
u: 6CCD780C-BABA-1026-9564-5B8C656024DB
mysql> select format_uuid(u, 1) as u from u\G
u: {6CCD780C-BABA-1026-9564-5B8C656024DB}

and internally, this is stored as 0x1026BABA6CCD780C95645B8C656024DB, because variant is RFC 4122 and Type is 1.

For any other Type of the RFC 4122 the internal swapping does not happen.

Trying to store anything that is not a RFC 4122 variant is an error. Trying to store anything that is a RFC 4122 variant but not a Type 1 to 6 is an error.

Planet MySQL