Generated Columns is a new feature available in the latest lab release. This work is based on a contribution by Andrey Zhakov. Thanks, Andrey! The Optimizer team modified it to follow the current MySQL design, and to lift a number of limitations.
The syntax is:
<type> [ GENERATED ALWAYS ] AS ( <expression> ) [ VIRTUAL|STORED ]
[ UNIQUE [KEY] ] [ [PRIMARY] KEY ] [ NOT NULL ] [ COMMENT <text> ]
There are two kinds of Generated Columns: virtual (default) and stored. Virtual means that the column will be calculated on the fly when a record is read from a table. Stored means that the column will be calculated when a new record is written in the table, and after that it will be treated as a regular field. Both types can have NOT NULL restrictions, but only a stored Generated Column can be be a part of an index.
There are still a few limitations on Generated Columns:
The generation expression used can only call native deterministic functions; stored routines and UDFs are not yet supported.
The length of the generation expressions for a table are subject to the limitations of the .frm file, and thus they can’t be longer than 64K in total. So you can have one field with a generation expression length of 64K, or you can have 30 fields with an average expression length of 2K each.
A Generated Column can’t refer to itself or to other Generated Columns that are later defined, but it can refer to any previously defined Generated Column. This limitation does not apply to regular columns. A Generated Column can refer to any regular one, no matter where it’s defined.
mysql> CREATE TABLE sales ( -> name VARCHAR(20), -> price_eur DOUBLE,
-> amount INT,
-> total_eur DOUBLE AS (price_eur * amount),
-> total_usd DOUBLE AS (total_eur * xrate),
-> xrate DOUBLE);
Query OK, 0 rows affected (0,16 sec)
What can Generated Columns can be used for? Many things. To name few:
As a materialized cache for often used expressions:
CREATE TABLE xmltable (
username VARCHAR (20) AS (ExtractValue(doc,’/user/username’)),
doc TEXT);
INSERT INTO xmltable (doc) VALUES
(‘<user>
<id>1</id>
<username>tony</username>
<name>Tony Stark</name>
<info>A cool one</info>
</user>’),
(‘<user>
<id>2</id>
<username>ned</username>
<name>Eddard Stark</name>
<info>A cold one</info>
</user>’),
(‘<user>
<id>3</id>
<username>berty</username>
<name>Albert Stark</name>
<info>A farmer</info>
</user>’);
SELECT ExtractValue(doc,’/user/id’)
FROM xmltable
WHERE ExtractValue(doc,’/user/username’) = ‘tony’ OR
ExtractValue(doc,’/user/username’) = ‘ned’ OR
ExtractValue(doc,’/user/username’) = ‘berty’;
+——————————+
| ExtractValue(doc,’/user/id’) |
+——————————+
| 1 |
| 2 |
| 3 |
+——————————+
SELECT ExtractValue(doc,’/user/id’)
FROM xmltable
WHERE username = ‘tony’ OR username = ‘ned’ OR username = ‘berty’;
+——————————+
| ExtractValue(doc,’/user/id’) |
+——————————+
| 1 |
| 2 |
| 3 |
+——————————+
The result of those two queries is exactly the same, but in the first one ExtractValue(doc,’/user/username’) will be evaluated 3 times per record read, while in the second only once per record read. If the “username” columns would be defined as STORED then the generation expression will be evaluated only when a record is inserted or updated.
Another similar case is that Generated Columns (GC) could be used to add flexibility by replacing often used expressions with a GC. For example, if you have a bunch of applications that work on the same database then it might be practical to have a unified way to access data without need to keep all apps in sync.
Providing indexes for joins with non-relational data:CREATE TABLE users (userid INT, username VARCHAR(16));
CREATE TABLE comments(
userid int AS (ExtractValue(comment,’/comment/userid’)) STORED,
comment TEXT,
KEY(userid));
INSERT INTO users VALUES (1, ‘tony’),(2, ‘ned’), (3, ‘berty’);
INSERT INTO comments (comment) VALUES
("<comment>
<id>1</id>
<userid>1</userid>
<text>I definitely need a new suit</text>
</comment>"),
("<comment>
<id>2</id>
<userid>2</userid>
<text>No sight of global warming yet</text>
</comment>"),
("<comment>
<id>3</id>
<userid>2</userid>
<text>Traffic jams in King’s Landing is truly horrible</text>
</comment>");
SELECT
extractvalue(comment,’/comment/id’) as id, extractvalue(comment,’/comment/text’) as text
FROM users JOIN
comments ON users.userid=comments.userid
WHERE username = ‘ned’;
+——+————————————————–+
| id | text |
+——+————————————————–+
| 2 | No sight of global warming yet |
| 3 | Traffic jams in King’s Landing is truly horrible |
+——+————————————————–+
EXPLAIN SELECT
extractvalue(comment,’/comment/id’) AS id, extractvalue(comment,’/comment/text’) AS text
FROM users JOIN
comments ON users.userid=comments.userid WHERE username = ‘ned’\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: users
partitions: NULL
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 3
filtered: 33.33
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: comments
partitions: NULL
type: ref
possible_keys: userid
key: userid
key_len: 5
ref: test.users.userid
rows: 2
filtered: 100.00
Extra: NULLHere ref access over an index is used to access a table with XML data.
Working around a limited set of partitioning functions:CREATE TABLE characters (
race VARCHAR(16) AS (ExtractValue(chars,’/character/race’)) STORED,
chars TEXT)
PARTITION BY LIST COLUMNS (race) (
PARTITION hum VALUES IN ("human"), PARTITION hlf VALUES IN ("halfling"), PARTITION elf VALUES IN ("elf"), PARTITION orc VALUES IN ("orc") );
INSERT INTO characters(chars) VALUES
(‘<character>
<name>Elrond</name>
<race>elf</race>
</character>’),
(‘<character>
<name>Bilbo</name>
<race>halfling</race>
</character>’),
(‘<character>
<name>Frodo</name>
<race>halfling</race>
</character>’);
SELECT
ExtractValue(chars,’/character/name’) FROM characters
WHERE race=’halfling’;
+—————————————+
| ExtractValue(chars,’/character/name’) |
+—————————————+
| Bilbo |
| Frodo |
+—————————————+
EXPLAIN SELECT
ExtractValue(chars,’/character/name’) FROM characters
WHERE race=’halfling’\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: characters
partitions: hlf
type: ALL
possible_keys:
NULL key: NULL
key_len: NULL
ref: NULL
rows: 2
filtered: 50.00
Extra: Using whereNote that only one partition is going to be scanned due to partition pruning.
As always, there are more than one way to do things. Generated Columns adds yet another means of solving a variety of interesting challenges. It now also becomes more convenient to deal with derivatives of relational and non-relational data. We look forward to seeing all of the interesting ways that you apply the feature!
via Planet MySQL
Generated Columns in MySQL 5.7.5