MySQL in Microservices Environments

https://www.percona.com/blog/wp-content/uploads/2022/09/MySQL-in-Microservices-Environments.pngMySQL in Microservices Environments

MySQL in Microservices EnvironmentsThe microservice architecture is not a new pattern but has become very popular lately for mainly two reasons: cloud computing and containers. That combo helped increase adoption by tackling the two main concerns on every infrastructure: Cost reduction and infrastructure management.

However, all that beauty hides a dark truth:

The hardest part of microservices is the data layer

And that is especially true when it comes to classic relational databases like MySQL. Let’s figure out why that is.

MySQL and the microservice

Following the same two pillars of microservices (cloud computing and containers), what can one do with that in the MySQL space? What do cloud computing and containers bring to the table?

Cloud computing

The magic of the cloud is that it allows you to be cost savvy by letting you easily SCALE UP/SCALE DOWN the size of your instances. No more wasted money on big servers that are underutilized most of the time. What’s the catch? It’s gotta be fast. Quick scale up to be ready to deal with traffic and quick scale down to cut costs when traffic is low. 

Containers

The magic of containers is that one can slice hardware to the resource requirements. The catch here is that containers were traditionally used on stateless applications. Disposable containers.

Relational databases in general and MySQL, in particular, are not fast to scale and are stateful. However, it can be adapted to the cloud and be used for the data layer on microservices.

The Scale Cube

The Scale Cube

The book “The Art of Scalability” by Abott and Fisher describes a really useful, three dimension scalability model: the Scale Cube. In this model, scaling an application by running clones behind a load balancer is known as X-axis scaling. The other two kinds of scaling are Y-axis scaling and Z-axis scaling. The microservice architecture is an application of Y-axis scaling: It defines an architecture that structures the application as a set of loosely coupled, collaborating services.

  • X-Axis: Horizontal Duplication and Cloning of services and data (READ REPLICAS)
  • Y-Axis: Functional Decomposition and Segmentation – (MULTI-TENANT)
  • Z-Axis: Service and Data Partitioning along Customer Boundaries – (SHARDING)

On microservices, each service has its own database in order to be decoupled from other services. In other words: a service’s transactions only involve its database; data is private and accessible only via the microservice API.

It’s natural that the first approach to divide the data is by using the multi-tenant pattern:

Actually before trying multi-tenant, one can use a tables-per-service model where each service owns a set of tables that must only be accessed by that service, but by having that “soft” division, the temptation to skip the API and access directly other services’ tables is huge.

Schema-per-service, where each service has a database schema that is private to that service is appealing since it makes ownership clearer. It is easy to create a user per database, with specific grants to limit database access.

This pattern of “shared database” however comes with some drawbacks like:

  • Single hardware: a failure in your database will hurt all the microservices
  • Resource-intensive tasks related to a particular database will impact the other databases (think on DDLs)
  • Shared resources: disk latency, IOPS, and bandwidth needs to be shared, as well as other resources like CPU, Network bandwidth, etc.

The alternative is to go “Database per service”

Database per service

Share nothing. Cleaner logical separation. Isolated issues. Services are loosely coupled. In fact, this opens the door for microservices to use a database that best suits their needs, like a graph db, a document-oriented database, etc. But as with everything, this also comes with drawbacks:

  • The most obvious: cost. More instances to deploy
  • The most critical: Distributed transactions. As we mentioned before, microservices are collaborative between them and that means that transactions span several services. 

The simplistic approach is to use a two-phase commit implementation. But that solution is just an open invitation to a huge amount of locking issues. It just doesn’t scale. So what are the alternatives?

  • Implementing transactions that span services: The Saga pattern
  • Implementing queries that span services: API composition or Command Query Responsibility Segregation (CQRS)

A saga is a sequence of local transactions. Each local transaction updates the database and publishes messages or events that trigger the next local transaction in the saga. If a local transaction fails for whatever reason, then the saga executes a series of compensating transactions that undo the changes made by the previous transactions. More on Saga here: https://microservices.io/patterns/data/saga.html

An API composition is just a composer that invokes queries on each microservice and then performs an in-memory join of the results:
https://microservices.io/patterns/data/api-composition.html

CQRS is keeping one or more materialized views that contain data from multiple services. This avoids the need to do joins on the query size: https://microservices.io/patterns/data/cqrs.html

What do all these alternatives have in common? That is taken care of at the API level: it becomes the responsibility of the developer to implement and maintain it. The data layer keep continues to be data, not information.

Make it cloud

There are means for your MySQL to be cloud-native: Easy to scale up and down fast; running on containers, a lot of containers; orchestrated with Kubernetes; with all the pros of Kubernetes (health checks, I’m looking at you).

Percona Operator for MySQL based on Percona XtraDB Cluster

A Kubernetes Operator is a special type of controller introduced to simplify complex deployments. Operators provide full application lifecycle automation and make use of the Kubernetes primitives above to build and manage your application. 

Percona Operator for MySQL

In our blog post “Introduction to Percona Operator for MySQL Based on Percona XtraDB Cluster” an overview of the operator and its benefits are covered. However, it’s worth mentioning what does it make it cloud native:

  • It takes advantage of cloud computing, scaling up and down
  • Runs con containers
  • Is orchestrated by the cloud orchestrator itself: Kubernetes

Under the hood is a Percona XtraDB Cluster running on PODs. Easy to scale out (increase the number of nodes: https://www.percona.com/doc/kubernetes-operator-for-pxc/scaling.html) and can be scaled up by giving more resources to the POD definition (without downtime)

Give it a try https://www.percona.com/doc/kubernetes-operator-for-pxc/index.html and unleash the power of the cloud on MySQL.

Planet MySQL

MySQL: Data for Testing

https://blog.koehntopp.info/uploads/2022/09/test-data-01.jpg

Where I work, there is an ongoing discussion about test data generation.
At the moment we do not replace or change any production data for use in the test environments, and we don’t generate test data.

That is safe and legal, because production data is tokenized.
That is, PII and PCI data is being replaced by placeholder tokens which can be used by applications to access the actual protected data through specially protected access services.
Only a very limited circle of people is dealing with data in behind the protected services.

Using production data in test databases is also fast, because we copy data in parallel, at line speed, or we make redirect-on-write (“copy-on-write” in the age of SSD) writeable snapshots available.

Assume for a moment we want to change that and

  • mask data from production when we copy it to test databases
  • reduce the amount of data used in test databases, while maintaining referential integrity
  • generate test data sets of any size instead of using production data, keeping referential integrity and also some baseline statistical properties


Masking Data

Assume we make a copy of a production database into a test setup.
We then prepare that copy by changing every data item we want to mask, for example by replacing it with sha("secret" + original value), or some other replacement token.

kris@localhost [kris]> select sha(concat("secret", "Kristian Köhntopp"));
+---------------------------------------------+
| sha(concat("secret", "Kristian Köhntopp")) |
+---------------------------------------------+
| 9697c8770a3def7475069f3be8b6f2a8e4c7ebf4 |
+---------------------------------------------+
1 row in set (0.00 sec)

Why are we using a hash function or some moral equivalent for this?
We want to keep referential integrity, of course.
So each occurence of Kristian Köhntopp will be replaced by 9697c8770a3def7475069f3be8b6f2a8e4c7ebf4, a predictable and stable value, instead of a random number 17 on the first occurence, and another random number, 25342, on the next.

In terms of database work, it means that after copying the data, we are running an update on every row, creating a binlog entry for each change.
If the column we change is indexed, the indexes that contain the column need to be updated.
If the column we change is a primary key, the physical location of the record in the table also changes, because InnoDB clusters data on primary key.

In short, while copying data is fast, masking data has a much, much higher cost and can nowhere run at line speed, on any hardware.


Reducing the amount of data

Early in the company history, around 15 years ago, a colleague worked on creating smaller test databases by selecting a subset of production data, while keeping referential integrity intact.
He failed.

Many others tried later, we have had projects trying this around every 3 to 5 years.
They also failed.

Each attempt always selects either an empty database or all of the data from production.

Why is that?

Let’s try a simple model:
We have users, we have hotels, and they are in a n:m relationship, the stay.

“Kris” stays in a Hotel, the “Casa”.
We select Kris into the test data set from production, and also the “Casa”.
Other people stayed at the Casa, so they also are imported into the test set, but they also stayed at other hotels, so these hotels are also imported, and so forth, back between two tables.
After 3 reflections, 6 transitions, we have selected the entire production database into the test set.

Our production data is interconnected, and retaining referential integrity of the production data will mean that selecting one will ultimately select all.

Limiting the timeframe can help:
we select only data from the past week or something.
But that has other implications, for example on the data distribution.
Also, our production data is heavily skewed in time when it comes to operations on availability – a lot of bookings happen in the last week.


Generating test data

Generating garbage data is fast, but still slower than making a copy.
That is because copying data from a production machine can copy binary files with prebuilt existing indexes, whereas generating garbage data is equivalent to importing a mysqldump.
The data needs to be parsed, and worse, all indexes need to be built.

While we can copy data at hundreds of MB per second, up into the GB/s range, importing data happens at single digit MB/s up to low tens of MB/s.
A lot depends on the availability of RAM, and on the speed of the disk, also on the number of indexes and if the input data is sorted by primary key.

Generating non-garbage data is also hard, and slow.

You need to have the list of referential integrity constraints either define or inferred from a schema.
At work, our schema is large, much larger than a single database or service – users in the user service (test user service, with test user data) need to be referenced in the reservations service (test reservations service with test bookings), referring to hotels in the test availability and test hotel store.

That means either creating and maintaining a consistent second universe, or creating this, across services, from scratch, for each test.
One is a drag on productivity (maintaining consistent universes is a lot of work), the other is slow.

Consistency is not the only requirement, though.
Consistency is fine if you want to test validation code (but my test names are not utf8, they are from the hex digit subset of ASCII!).

But if you want to talk performance, additional requirements appear:

  • Data size.

    If your production data is 2 TB in size, but your test set is 200 GB, it is not linearly 10x faster.
    The relationship is non-linear: On a given piece of hardware, production data may be IO-limited because the working set does not fit into memory, whereas the test data can fit WSS into memory.
    Production data will produce disk reads proportionally to the load, test data will run from memory and after warmup has no read I/O – a completely different performance model applies.
    Performance tests on the test data have zero predictive value for production performance.

  • Data Distribution.

    Production data and production data access is subject to data access patterns that are largely unknown and undocumented, and are also changing.
    Some users are whales that travel 100x more than norms.
    Other users are frequent travellers, and travel 10x more than norms.
    Some amount of users are one-offs and appear only once in the data set.
    What is the relation between these sets, and what is the effect they have on data access?

For example:


A database benchmark I lost

The german computer magazine c’t in 2006 had an application benchmark described in web request accesses to a DVD rental store.
The contestants were supposed to write a DVD rental store using any technology they wished, defined in the required output and the URL request they would be exposed to in testing.
MySQL, for which I worked as a consultant, wanted to participate, and for that I put the templates provided into a web shop using MySQL, and tuned shop and database accordingly.

I got nowhere the top 10.

That is because I used a real web shop with real assumptions about user behavior, including caches and stuff.

The test data used was generated, and the requests were equally distributed:
Each DVD in the simulated DVD store was equally likely to be rented and each user rented the same amount of DVDs.
Any cache you put into the store would overflow, and go into threshing, or would have to have sufficient memory to keep the entire store in cache.

Real DVD rental stores have a top 100 of popular titles, and a long tail. Caching helps.
In the test, caching destroyed performance.


Another database benchmark I lost

Another german computer magazine had another database benchmark, which basically hammered the system under test with a very high load.
Unfortunately, here the load was not evenly distributed, but a few keys were being exercised very often, whereas a lot of keys were never requested.
Effectively the load generator has a large number of threads, and each thread was exercising “their” key in the database – thread 1 to id 1 in the table, and so on.

This exercised a certain number of hot keys, and waited very fast on a few locks, but did not actually simulate accurately any throughput limits.
If you exercised the system with more production-like load, it would have had around 100x more total throughput.


TL;DR

Producing masked or simulated data for testing is around 100x more expensive computationally than copying production data.
If production data is already tokenized, the win is also questionable, compared to the effort spent.

Producing valid test data is computationally expensive, especially in a microservices architecture in which referential integrity is to be maintained across service boundaries.

Valid test data is not necessarily useful in testing, especially when it comes to performance testing.
Performance testing is specifically also dependent on data access patterns, working set sizes and arrival rate distributions that affect locking times.

In the end, the actual test environment is always production, and in my personal professional experience there is a lot more value in making testing in production safe than in producing accurate testing environments.

Planet MySQL

MySQL: Data for Testing

https://blog.koehntopp.info/uploads/2022/09/test-data-01.jpg

Where I work, there is an ongoing discussion about test data generation.
At the moment we do not replace or change any production data for use in the test environments, and we don’t generate test data.

That is safe and legal, because production data is tokenized.
That is, PII and PCI data is being replaced by placeholder tokens which can be used by applications to access the actual protected data through specially protected access services.
Only a very limited circle of people is dealing with data in behind the protected services.

Using production data in test databases is also fast, because we copy data in parallel, at line speed, or we make redirect-on-write (“copy-on-write” in the age of SSD) writeable snapshots available.

Assume for a moment we want to change that and

  • mask data from production when we copy it to test databases
  • reduce the amount of data used in test databases, while maintaining referential integrity
  • generate test data sets of any size instead of using production data, keeping referential integrity and also some baseline statistical properties


Masking Data

Assume we make a copy of a production database into a test setup.
We then prepare that copy by changing every data item we want to mask, for example by replacing it with sha("secret" + original value), or some other replacement token.

kris@localhost [kris]> select sha(concat("secret", "Kristian Köhntopp"));
+---------------------------------------------+
| sha(concat("secret", "Kristian Köhntopp")) |
+---------------------------------------------+
| 9697c8770a3def7475069f3be8b6f2a8e4c7ebf4 |
+---------------------------------------------+
1 row in set (0.00 sec)

Why are we using a hash function or some moral equivalent for this?
We want to keep referential integrity, of course.
So each occurence of Kristian Köhntopp will be replaced by 9697c8770a3def7475069f3be8b6f2a8e4c7ebf4, a predictable and stable value, instead of a random number 17 on the first occurence, and another random number, 25342, on the next.

In terms of database work, it means that after copying the data, we are running an update on every row, creating a binlog entry for each change.
If the column we change is indexed, the indexes that contain the column need to be updated.
If the column we change is a primary key, the physical location of the record in the table also changes, because InnoDB clusters data on primary key.

In short, while copying data is fast, masking data has a much, much higher cost and can nowhere run at line speed, on any hardware.


Reducing the amount of data

Early in the company history, around 15 years ago, a colleague worked on creating smaller test databases by selecting a subset of production data, while keeping referential integrity intact.
He failed.

Many others tried later, we have had projects trying this around every 3 to 5 years.
They also failed.

Each attempt always selects either an empty database or all of the data from production.

Why is that?

Let’s try a simple model:
We have users, we have hotels, and they are in a n:m relationship, the stay.

“Kris” stays in a Hotel, the “Casa”.
We select Kris into the test data set from production, and also the “Casa”.
Other people stayed at the Casa, so they also are imported into the test set, but they also stayed at other hotels, so these hotels are also imported, and so forth, back between two tables.
After 3 reflections, 6 transitions, we have selected the entire production database into the test set.

Our production data is interconnected, and retaining referential integrity of the production data will mean that selecting one will ultimately select all.

Limiting the timeframe can help:
we select only data from the past week or something.
But that has other implications, for example on the data distribution.
Also, our production data is heavily skewed in time when it comes to operations on availability – a lot of bookings happen in the last week.


Generating test data

Generating garbage data is fast, but still slower than making a copy.
That is because copying data from a production machine can copy binary files with prebuilt existing indexes, whereas generating garbage data is equivalent to importing a mysqldump.
The data needs to be parsed, and worse, all indexes need to be built.

While we can copy data at hundreds of MB per second, up into the GB/s range, importing data happens at single digit MB/s up to low tens of MB/s.
A lot depends on the availability of RAM, and on the speed of the disk, also on the number of indexes and if the input data is sorted by primary key.

Generating non-garbage data is also hard, and slow.

You need to have the list of referential integrity constraints either define or inferred from a schema.
At work, our schema is large, much larger than a single database or service – users in the user service (test user service, with test user data) need to be referenced in the reservations service (test reservations service with test bookings), referring to hotels in the test availability and test hotel store.

That means either creating and maintaining a consistent second universe, or creating this, across services, from scratch, for each test.
One is a drag on productivity (maintaining consistent universes is a lot of work), the other is slow.

Consistency is not the only requirement, though.
Consistency is fine if you want to test validation code (but my test names are not utf8, they are from the hex digit subset of ASCII!).

But if you want to talk performance, additional requirements appear:

  • Data size.

    If your production data is 2 TB in size, but your test set is 200 GB, it is not linearly 10x faster.
    The relationship is non-linear: On a given piece of hardware, production data may be IO-limited because the working set does not fit into memory, whereas the test data can fit WSS into memory.
    Production data will produce disk reads proportionally to the load, test data will run from memory and after warmup has no read I/O – a completely different performance model applies.
    Performance tests on the test data have zero predictive value for production performance.

  • Data Distribution.

    Production data and production data access is subject to data access patterns that are largely unknown and undocumented, and are also changing.
    Some users are whales that travel 100x more than norms.
    Other users are frequent travellers, and travel 10x more than norms.
    Some amount of users are one-offs and appear only once in the data set.
    What is the relation between these sets, and what is the effect they have on data access?

For example:


A database benchmark I lost

The german computer magazine c’t in 2006 had an application benchmark described in web request accesses to a DVD rental store.
The contestants were supposed to write a DVD rental store using any technology they wished, defined in the required output and the URL request they would be exposed to in testing.
MySQL, for which I worked as a consultant, wanted to participate, and for that I put the templates provided into a web shop using MySQL, and tuned shop and database accordingly.

I got nowhere the top 10.

That is because I used a real web shop with real assumptions about user behavior, including caches and stuff.

The test data used was generated, and the requests were equally distributed:
Each DVD in the simulated DVD store was equally likely to be rented and each user rented the same amount of DVDs.
Any cache you put into the store would overflow, and go into threshing, or would have to have sufficient memory to keep the entire store in cache.

Real DVD rental stores have a top 100 of popular titles, and a long tail. Caching helps.
In the test, caching destroyed performance.


Another database benchmark I lost

Another german computer magazine had another database benchmark, which basically hammered the system under test with a very high load.
Unfortunately, here the load was not evenly distributed, but a few keys were being exercised very often, whereas a lot of keys were never requested.
Effectively the load generator has a large number of threads, and each thread was exercising “their” key in the database – thread 1 to id 1 in the table, and so on.

This exercised a certain number of hot keys, and waited very fast on a few locks, but did not actually simulate accurately any throughput limits.
If you exercised the system with more production-like load, it would have had around 100x more total throughput.


TL;DR

Producing masked or simulated data for testing is around 100x more expensive computationally than copying production data.
If production data is already tokenized, the win is also questionable, compared to the effort spent.

Producing valid test data is computationally expensive, especially in a microservices architecture in which referential integrity is to be maintained across service boundaries.

Valid test data is not necessarily useful in testing, especially when it comes to performance testing.
Performance testing is specifically also dependent on data access patterns, working set sizes and arrival rate distributions that affect locking times.

In the end, the actual test environment is always production, and in my personal professional experience there is a lot more value in making testing in production safe than in producing accurate testing environments.

Planet MySQL

Stream MySQL data with mydumper

https://mydbops.files.wordpress.com/2022/06/mydumper_stream.png?w=569

Mydumper supports streaming of backups right from version 0.11.3 and the latest version Mydumper 0.12.3 it enabled its support for compressed streaming backup. This was the most awaited feature added to Mydumper, making it a more powerful tool for data migration to RDS or Cloud.

If you are hearing mydumper for the first time, then let’s have a quick catch-up on what Mydumper is and what it does exactly.

Mydumper is a multithread logical backup and restores tool for MySQL and its forks. To know more you can refer to our previous blogs/presentations below.

In this blog, we will discuss in short how this streaming process works and how to get the required output.

  1. How does this work?
  2. How to use it?
  3. Key Takeaways:

How does this work?

The working of the stream is quite simple

  • Mydumper threads read data from the source DB and write to the file system parallels.
  • Now mydumper stream thread enqueues these files one by one and pipes them to the stdout.
  • Myloader stream reads them and writes to its local filesystem.
  • Now myloader threads restore them parallel to the destination by maintaining the thread_id sequence.

How to use it?

Below is the working command which I have used for the production use case to restore a table to RDS using stream.

mydumper -t 6 -u mydbops --password='XXXXXXX' -h localhost -P 3306 --compress -o /mysql/logs/backup -B 'testdb' -T 'testdb.member --stream | myloader --threads=4 -u admin --password='XXXXX' -h 'mydbops-stg.bvbkjkuliuie.ap-south-1.rds.amazonaws.com' -P 3308 -v 4 -d /mysql/logs/restore -o -B ‘testdb’ --stream

–stream : mydumper indicates that files created need to be streamed through STDOUT.

–stream : myloader will create a thread to read the stream and creates the file locally

–no-delete : retains the files locally in both source and destination this is optional

By default, once the file is successfully transferred from the source it gets deleted immediately, similarly at the destination once the streamed file is applied it gets deleted from its file system. This avoids high disk utilization during file backup when migrating a high volume of data.

Logs: 
** Message: 12:11:12.002: File backup/testdb-member-create.sql.gz transfered | Global: 0 MB/s

** Message: 12:11:12.003: Thread 3 dumping schema for `testdb`.`member`

** Message: 12:11:12.003: Thread 4 dumping data for`testdb`.`member`| Remaining jobs: -3

** Message: 12:11:12.003: Opening: backup/testdb-member-schema.sql.gz

** Message: 12:11:12.003: File backup/testdb-member-schema.sql.gz transfered | Global: 0 MB/s

** Message: 12:11:12.064: Non-InnoDB dump complete, unlocking tables

** Message: 12:11:12.064: Shutdown jobs enqueued

** Message: 12:27:54.912: Finished dump at: 2022-06-09 12:27:54

** Message: 12:27:54.913: Removing file: /mysql/logs/restore/restore/testdb-member-schema.sql.gz

** Message: 12:27:54.914: Thread 4 restoring table `testdb`.`member` from 

** Message: 12:27:56.433: Removing file: /mysql/logs/restore/restore/testdb-member.00000.sql.gz

** Message: 12:27:56.434: Shutting down stream thread 2

** Message: 12:27:56.434: Starting table checksum verification

Key Takeaways:

  • With a stream, Mydumper is considered an easy and faster method for the migration of data
  • Disk utilization is always kept under control with an auto-purge of backup files.

Planet MySQL

MySQL: Sometimes it is not the database

https://blog.koehntopp.info/uploads/2022/09/not-the-database-01.jpg

Query latencies in one data center are larger than elsewhere for one replication hierarchy, but only in the high percentiles.
This impacts production and traffic is being failed away from that data center to protect production.

When the P50 and P90 look okay, but the P99 and the P99.9 do not, the database(s) operate normally, and only some queries are running slow.
The initial guess was “for some queries the plan has flipped, but only in that data center.”

But first let’s have a look at the database size and the schema.


A tiny database

The schema in question holds metadata for a change data capture process, and that is not a lot.

# du -sh *
0 coredumps
5.6G data
704M log
0 tmp
# du -sh data/theschema
93M data/theschema

and in memory:

The mysqld process has a RES (resident set size) of only 5.9GB, even if the database is allowed to grow to a VIRT (virtual memory size) of 71.8 G.

This is running on a bare metal blade for the moment, and these do not come smaller than this.
In a virtual machine, it can be as small as a 1/4 blade – but database instances have a fixed overhead and going much smaller hardly makes sense.

In any case, this is running off memory, and would be doing so even if hosted on an iPhone.
There can’t be disk scans, even if there were bad queries.
And even with memory scans a thing this tiny won’t exhaust CPU or scan for a long time in memory.
Something definitively smells funny around here.

[information_schema]> select table_name, table_rows from tables where table_schema = 'schemaregistry' order by table_rows desc;
+--------------------+------------+
| TABLE_NAME | TABLE_ROWS |
+--------------------+------------+
| table_01 | 14376 |
| table_02 | 9510 |
| table_03 | 3079 |
| table_04 | 84 |
| table_05 | 0 |
| db_waypoint | 0 |
+--------------------+------------+
6 rows in set (0.00 sec)


Scanning for slow queries

We are using performance_schema directly to ask for statistics for queries that have been seen that are slow.

[performance_schema]> select  -> user,  -> event_name,  -> count_star,  -> avg_timer_wait/1000000000 as avg_ms  -> from events_statements_summary_by_user_by_event_name  -> where user = 'production_username'
 -> and event_name like 'statement/sql/%' and count_star > 0;
+----------------------+-----------------------------+------------+--------+
| user | event_name | count_star | avg_ms |
+----------------------+-----------------------------+------------+--------+
| production_username | statement/sql/select | 42121722 | 0.2913 |
| production_username | statement/sql/set_option | 270284 | 0.0708 |
| production_username | statement/sql/show_warnings | 67571 | 0.0498 |
+----------------------+-----------------------------+------------+--------+
3 rows in set (0.01 sec)

P_S is not intended to be used directly by humans.
The tables are optimized for fast data collection.
Data is not locked during read as to not slow collection, times are reported in PicoSeconds (1/10^12) as to avoid any DIV instructions on write and buffers are size-limited so if some action is spamming P_S, data is lost, but the server does not slowed or losing memory.

Consequently we divide by 10^9 to get average statement runtime, and we report any statement statistics since server start (or table truncate) that have been collected for any statement.
It turns out, the production user has been running only select statements, set statements and show warnings commands.

While the averages look good, maxima are not:

[performance_schema]> select  -> event_name,  -> count_star,  -> avg_timer_wait/1000000000 as avg_ms,  -> max_timer_wait/1000000000 as max_ms  -> from events_statements_summary_by_user_by_event_name  -> where user = 'production_username'  -> and event_name like 'statement/sql/%' and count_star > 0;
+-----------------------------+------------+--------+------------+
| event_name | count_star | avg_ms | max_ms |
+-----------------------------+------------+--------+------------+
| statement/sql/select | 42121722 | 0.2913 | 14934.0024 |
| statement/sql/set_option | 270284 | 0.0708 | 1.2732 |
| statement/sql/show_warnings | 67571 | 0.0498 | 0.9574 |
+-----------------------------+------------+--------+------------+
3 rows in set (0.00 sec)

So there was one select statement that ran a whopping 14s on a database that has no table with more than 15k rows.


Vividcortex aka SolarWinds DPM

We onboard this hierarchy to Vividcortex, a monitor that collects performance data from databases, and allows to see specific queries that execute slowly.
It can also help in determining possible improvements.

Vividcortex inventory for streaming. Normally Vividcortex does not run on all instances, but the primary and one pooled replica. We wanted a specific pooled replica in Frankfurt, though, so something with a 6000 number.

Our normal Vividcortex onboarding installs probes on the primary and one pooled replica, because it is not necessary to overwhelm the collection interface with all queries from all production machines.
A sample will do fine.

In this case, we want a replica in a specific location, though: only one data center behaves abnormally, so we would want one more machine within that location.
This required some bespoke puppet artistry, but it worked.
But, even then we do not get queries that are particularly interesting:

We get Query Count, and Average Latency.
But from the counts and the word average we can already see that this is not useful: we would have wanted to see high percentiles.
Also, the queries are all uninteresting.

Now, we believe most queries are fine, only some instances of queries that are mostly fine are taking unexpectedly long.
And we want to see those.

We can already see that the default view of VividCortex here is not helpful, and a quick exploration of the user interface quickly reveals that this tool is maybe not optimally useful for our specific hunt.

We go back to P_S and handcraft our stuff.


Plundering P_S

Let’s see what is on the menu:

[performance_schema]> show tables like '%statement%';
+----------------------------------------------------+
| Tables_in_performance_schema (%statement%) |
+----------------------------------------------------+
| events_statements_current |
| events_statements_histogram_by_digest |
| events_statements_histogram_global |
| events_statements_history |
| events_statements_history_long |
| events_statements_summary_by_account_by_event_name |
| events_statements_summary_by_digest |
| events_statements_summary_by_host_by_event_name |
| events_statements_summary_by_program |
| events_statements_summary_by_thread_by_event_name |
| events_statements_summary_by_user_by_event_name |
| events_statements_summary_global_by_event_name |
| prepared_statements_instances |
+----------------------------------------------------+
13 rows in set (0.00 sec)

I don’t know about you, but events_statements_summary_by_digest looks tasty by me.
What is in it?

[performance_schema]> desc events_statements_summary_by_digest;
+-----------------------------+-----------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+-----------------+------+-----+---------+-------+
| SCHEMA_NAME | varchar(64) | YES | MUL | NULL | |
| DIGEST | varchar(64) | YES | | NULL | |
| DIGEST_TEXT | longtext | YES | | NULL | |
| COUNT_STAR | bigint unsigned | NO | | NULL | |
| SUM_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| MIN_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| AVG_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| MAX_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| SUM_LOCK_TIME | bigint unsigned | NO | | NULL | |
| SUM_ERRORS | bigint unsigned | NO | | NULL | |
| SUM_WARNINGS | bigint unsigned | NO | | NULL | |
| SUM_ROWS_AFFECTED | bigint unsigned | NO | | NULL | |
| SUM_ROWS_SENT | bigint unsigned | NO | | NULL | |
| SUM_ROWS_EXAMINED | bigint unsigned | NO | | NULL | |
| SUM_CREATED_TMP_DISK_TABLES | bigint unsigned | NO | | NULL | |
| SUM_CREATED_TMP_TABLES | bigint unsigned | NO | | NULL | |
| SUM_SELECT_FULL_JOIN | bigint unsigned | NO | | NULL | |
| SUM_SELECT_FULL_RANGE_JOIN | bigint unsigned | NO | | NULL | |
| SUM_SELECT_RANGE | bigint unsigned | NO | | NULL | |
| SUM_SELECT_RANGE_CHECK | bigint unsigned | NO | | NULL | |
| SUM_SELECT_SCAN | bigint unsigned | NO | | NULL | |
| SUM_SORT_MERGE_PASSES | bigint unsigned | NO | | NULL | |
| SUM_SORT_RANGE | bigint unsigned | NO | | NULL | |
| SUM_SORT_ROWS | bigint unsigned | NO | | NULL | |
| SUM_SORT_SCAN | bigint unsigned | NO | | NULL | |
| SUM_NO_INDEX_USED | bigint unsigned | NO | | NULL | |
| SUM_NO_GOOD_INDEX_USED | bigint unsigned | NO | | NULL | |
| SUM_CPU_TIME | bigint unsigned | NO | | NULL | |
| COUNT_SECONDARY | bigint unsigned | NO | | NULL | |
| FIRST_SEEN | timestamp(6) | NO | | NULL | |
| LAST_SEEN | timestamp(6) | NO | | NULL | |
| QUANTILE_95 | bigint unsigned | NO | | NULL | |
| QUANTILE_99 | bigint unsigned | NO | | NULL | |
| QUANTILE_999 | bigint unsigned | NO | | NULL | |
| QUERY_SAMPLE_TEXT | longtext | YES | | NULL | |
| QUERY_SAMPLE_SEEN | timestamp(6) | NO | | NULL | |
| QUERY_SAMPLE_TIMER_WAIT | bigint unsigned | NO | | NULL | |
+-----------------------------+-----------------+------+-----+---------+-------+
37 rows in set (0.00 sec)


Huh? What? The structure of P_S

At this point it is maybe a good idea to stop and establish a few facts about P_S.
P_S collects data about statement execution and server performance.

[performance_schema]> show tables like 'setup%';
+---------------------------------------+
| Tables_in_performance_schema (setup%) |
+---------------------------------------+
| setup_actors |
| setup_consumers |
| setup_instruments |
| setup_objects |
| setup_threads |
+---------------------------------------+

The server is instrumented for collection, and the data sources are called instruments.

We can ask an instrument to collect or not collect data for certain monitoring users, certain actors.
We can also ask the instruments ignore certain tables, schemas or other things, certain objects.
And again, the same, for certain threads.

Collected data is stored in preallocated ring-buffers, or added to certain aggregates.
All these things are consumers.

Configuration happens through the setup tables above, which set up the data flow from instrument through the filtering dimensions to the consumers.

Event data collection happens in event tables, inside a hierarchy, from transactions, to individual statements that make up a transaction, to execution phases of a statement, stages, to waits (mostly for IO or locks).
These things nest, but not necessarily on a 1:1 basis – a statement can contain waits, or other statements, for example.

root@streamingdb-6001 [performance_schema]> show tables like 'event%current';
+----------------------------------------------+
| Tables_in_performance_schema (event%current) |
+----------------------------------------------+
| events_stages_current |
| events_statements_current |
| events_transactions_current |
| events_waits_current |
+----------------------------------------------+
4 rows in set (0.00 sec)

For each of these events, we have _current, _history and _history_long tables.
For example events_statements_current contains one entry for each active connection, events_statements_history the last few statements for each active connection, and events_statements_history_long the last few thousand statements across all connections.

There are other collections, about other aspects of the server, and more generalized statement aggregations, the summaries.


Queries and Query Digests

To be able to aggregate statements, there is the concept of a statements digest_text, and ultimately the digest, a hash number generated from the digest text.

So statements such as

select id from atable where id in ( 1, 2, 3)
SeLeCt id FROM atable where id IN (92929, 29292, 17654, 363562);

should be considered equivalent in an aggregate.
This is done by unparsing the statement from the parse tree, which gets rid of all the spacing and letter case differences in keywords.
During this, all constants and constant lists are replaced by ? or '?' respectively.

The digest text for the above two statements becomes

select id from atable where id in ( ? )

and the digest from that can then be generated by running a hash function over the digest text.

The disadvantage is that a digest cannot be explained with the EXPLAIN command, so we need to make sure to also keep a representative explainable version of the query.


Getting some data

In our case, events_statements_summary_by_digest is exactly what we want.
So we truncate the collection, and wait a bit, then ask.
The results are impossible:

[performance_schema]> truncate events_statements_summary_by_digest;
...
[performance_schema]> select  -> count_star, avg_timer_wait/1000000000 as avg_ms,
 -> QUANTILE_95/1000000000 as q95_ms,  -> first_seen,  -> last_seen,
 -> query_sample_text  -> from events_statements_summary_by_digest  -> where schema_name = 'schemaregistry'  -> and QUANTILE_95/1000000000 > 0.1
 -> order by QUANTILE_95/1000000000 asc \G
*************************** 1. row ***************************
 count_star: 21
 avg_ms: 0.0782
 q95_ms: 0.1202
 first_seen: 2022-09-19 14:35:28.885824
 last_seen: 2022-09-19 14:38:06.110111
query_sample_text: SELECT @@session.autocommit
*************************** 2. row ***************************
 count_star: 21
 avg_ms: 0.0902
 q95_ms: 0.1514
 first_seen: 2022-09-19 14:35:28.886215
 last_seen: 2022-09-19 14:38:06.110382
query_sample_text: SET sql_mode='STRICT_ALL_TABLES,NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES'

So we have completely internal configuration commands that sometimes take longer than 0.1ms to execute.
We have other instances of simple queries that sometimes take 14s to run, 1000x and more than the average.


And then…

Yeah, and that is as far as I got with my digging, when a colleague chimes in, pointing at the machine dashboard for the machine I am on.

A sick network interface is for sure messing with system performance.

One of the machines in the pooled replicas for this data center location is showing an elevated amount of network retransmits and the box probably needs some love from the data center operations engineers.

We experiment a bit by removing and re-adding the box to the pool, and sure enough: As soon as the system under test is in the pool the latencies are no longer production worthy.

The image from the title bar: End user experience with and without the broken box in the replica pool. One bad box spoils the experience for all the users.

So the root cause was not instances of one query executing badly, but all queries executing badly on one box of the pool, in one location.
Average query latencies, and even the P90 look good, but the P99 and the P99.9 go to hell.

The box is removed from the pool and a replacement has been scheduled.
The broken box will get a DCOE ticket.

Planet MySQL

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

https://www.alloutdoor.com/wp-content/uploads/2022/09/20220920_182135.jpg

The Surgeon Loop Knot is the fastest and easiest loop knot to tie. If you can tie an overhand knot, you can tie this knot. While it isn’t as neat and elegant as some other loop knots, it is a very strong loop knot that does not easily slip. You can use the Surgeon Loop Knot to make a loop at the end of your line for attaching weights or clips. Or as shown in this article, to make a loop to where you are attaching your hook or lure. While it is shown with a hook in this article, this knot excels for tying on lures and flies. The open loop gives whatever is tied to more action and movement. For just tying an empty loop just follow the same instructions without a hook.

Step 1

Run the line through the eye of the hook and bring the tag end of the line back along the mainline. Make sure your tag end of the line is long enough for the overhand loop to pass over the hook or lure.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot
The larger the hook or lure the longer the tag end should be

Step 2

Make an overhand loop using the doubled line and hook, make sure to keep the lines together and not twisted.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

Step 3

Pass the doubled line and hook through the overhand loop in the mainline one more time, again keeping this tidy is important for the final knot.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

Step 4

Moisten the knot and line then hold the hook or lure and the standing line and pull to tighten the knot. Make sure to adjust the loop size at this point. Once the knot has been snugged down trim the tag end to about a 1/4 inch. For fishing lures and jigs you want to have a loop about the size of an M&M, anything bigger and cause the lure to tangle.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

The post Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot appeared first on AllOutdoor.com.

AllOutdoor.com

singlestoredb/singlestoredb-laravel

https://packalyst.com/assets/img/logo.png

Description

A SingleStoreDB database driver for Laravel.

Last update

2022/09/19 23:22 (dev-main)


dev-main

Last update

2022/09/19 23:22

Require

  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

v1.0.2

Last update

2022/09/16 23:34

Require

  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

v1.0.1

Last update

2022/08/24 22:11

Require

  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

v1.0.0

Last update

2022/06/30 20:42

Require

  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

v0.0.3

Last update

2022/06/09 20:48

Require

  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

v0.0.2

Last update

2022/06/08 18:37

Require

  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

v0.0.1

Last update

2022/05/31 20:31

Require

  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

Packalyst :: Latest Packages

Floppy Disk Factory

https://theawesomer.com/photos/2022/09/floppy_disk_factory_t.jpg

Floppy Disk Factory

Link

It’s been a long time since we needed floppy disks to store data. But we still enjoyed watching this retro factory video posted by StirlingEngineering, which shows how they used to produce 3.5″ floppies. It’s a satisfying 5-minutes sequence of mechanical ear candy.

The Awesomer