Stream MySQL data with mydumper

Mydumper supports streaming of backups right from version 0.11.3 and the latest version Mydumper 0.12.3 it enabled its support for compressed streaming backup. This was the most awaited feature added to Mydumper, making it a more powerful tool for data migration to RDS or Cloud.

If you are hearing mydumper for the first time, then let’s have a quick catch-up on what Mydumper is and what it does exactly.

Mydumper is a multithread logical backup and restores tool for MySQL and its forks. To know more you can refer to our previous blogs/presentations below.

In this blog, we will discuss in short how this streaming process works and how to get the required output.

  1. How does this work?
  2. How to use it?
  3. Key Takeaways:

How does this work?

The working of the stream is quite simple

  • Mydumper threads read data from the source DB and write to the file system parallels.
  • Now mydumper stream thread enqueues these files one by one and pipes them to the stdout.
  • Myloader stream reads them and writes to its local filesystem.
  • Now myloader threads restore them parallel to the destination by maintaining the thread_id sequence.

How to use it?

Below is the working command which I have used for the production use case to restore a table to RDS using stream.

mydumper -t 6 -u mydbops --password='XXXXXXX' -h localhost -P 3306 --compress -o /mysql/logs/backup -B 'testdb' -T 'testdb.member --stream | myloader --threads=4 -u admin --password='XXXXX' -h '' -P 3308 -v 4 -d /mysql/logs/restore -o -B ‘testdb’ --stream

–stream : mydumper indicates that files created need to be streamed through STDOUT.

–stream : myloader will create a thread to read the stream and creates the file locally

–no-delete : retains the files locally in both source and destination this is optional

By default, once the file is successfully transferred from the source it gets deleted immediately, similarly at the destination once the streamed file is applied it gets deleted from its file system. This avoids high disk utilization during file backup when migrating a high volume of data.

** Message: 12:11:12.002: File backup/testdb-member-create.sql.gz transfered | Global: 0 MB/s

** Message: 12:11:12.003: Thread 3 dumping schema for `testdb`.`member`

** Message: 12:11:12.003: Thread 4 dumping data for`testdb`.`member`| Remaining jobs: -3

** Message: 12:11:12.003: Opening: backup/testdb-member-schema.sql.gz

** Message: 12:11:12.003: File backup/testdb-member-schema.sql.gz transfered | Global: 0 MB/s

** Message: 12:11:12.064: Non-InnoDB dump complete, unlocking tables

** Message: 12:11:12.064: Shutdown jobs enqueued

** Message: 12:27:54.912: Finished dump at: 2022-06-09 12:27:54

** Message: 12:27:54.913: Removing file: /mysql/logs/restore/restore/testdb-member-schema.sql.gz

** Message: 12:27:54.914: Thread 4 restoring table `testdb`.`member` from 

** Message: 12:27:56.433: Removing file: /mysql/logs/restore/restore/testdb-member.00000.sql.gz

** Message: 12:27:56.434: Shutting down stream thread 2

** Message: 12:27:56.434: Starting table checksum verification

Key Takeaways:

  • With a stream, Mydumper is considered an easy and faster method for the migration of data
  • Disk utilization is always kept under control with an auto-purge of backup files.

Planet MySQL

MySQL: Sometimes it is not the database

Query latencies in one data center are larger than elsewhere for one replication hierarchy, but only in the high percentiles.
This impacts production and traffic is being failed away from that data center to protect production.

When the P50 and P90 look okay, but the P99 and the P99.9 do not, the database(s) operate normally, and only some queries are running slow.
The initial guess was “for some queries the plan has flipped, but only in that data center.”

But first let’s have a look at the database size and the schema.

A tiny database

The schema in question holds metadata for a change data capture process, and that is not a lot.

# du -sh *
0 coredumps
5.6G data
704M log
0 tmp
# du -sh data/theschema
93M data/theschema

and in memory:

The mysqld process has a RES (resident set size) of only 5.9GB, even if the database is allowed to grow to a VIRT (virtual memory size) of 71.8 G.

This is running on a bare metal blade for the moment, and these do not come smaller than this.
In a virtual machine, it can be as small as a 1/4 blade – but database instances have a fixed overhead and going much smaller hardly makes sense.

In any case, this is running off memory, and would be doing so even if hosted on an iPhone.
There can’t be disk scans, even if there were bad queries.
And even with memory scans a thing this tiny won’t exhaust CPU or scan for a long time in memory.
Something definitively smells funny around here.

[information_schema]> select table_name, table_rows from tables where table_schema = 'schemaregistry' order by table_rows desc;
| table_01 | 14376 |
| table_02 | 9510 |
| table_03 | 3079 |
| table_04 | 84 |
| table_05 | 0 |
| db_waypoint | 0 |
6 rows in set (0.00 sec)

Scanning for slow queries

We are using performance_schema directly to ask for statistics for queries that have been seen that are slow.

[performance_schema]> select  -> user,  -> event_name,  -> count_star,  -> avg_timer_wait/1000000000 as avg_ms  -> from events_statements_summary_by_user_by_event_name  -> where user = 'production_username'
 -> and event_name like 'statement/sql/%' and count_star > 0;
| user | event_name | count_star | avg_ms |
| production_username | statement/sql/select | 42121722 | 0.2913 |
| production_username | statement/sql/set_option | 270284 | 0.0708 |
| production_username | statement/sql/show_warnings | 67571 | 0.0498 |
3 rows in set (0.01 sec)

P_S is not intended to be used directly by humans.
The tables are optimized for fast data collection.
Data is not locked during read as to not slow collection, times are reported in PicoSeconds (1/10^12) as to avoid any DIV instructions on write and buffers are size-limited so if some action is spamming P_S, data is lost, but the server does not slowed or losing memory.

Consequently we divide by 10^9 to get average statement runtime, and we report any statement statistics since server start (or table truncate) that have been collected for any statement.
It turns out, the production user has been running only select statements, set statements and show warnings commands.

While the averages look good, maxima are not:

[performance_schema]> select  -> event_name,  -> count_star,  -> avg_timer_wait/1000000000 as avg_ms,  -> max_timer_wait/1000000000 as max_ms  -> from events_statements_summary_by_user_by_event_name  -> where user = 'production_username'  -> and event_name like 'statement/sql/%' and count_star > 0;
| event_name | count_star | avg_ms | max_ms |
| statement/sql/select | 42121722 | 0.2913 | 14934.0024 |
| statement/sql/set_option | 270284 | 0.0708 | 1.2732 |
| statement/sql/show_warnings | 67571 | 0.0498 | 0.9574 |
3 rows in set (0.00 sec)

So there was one select statement that ran a whopping 14s on a database that has no table with more than 15k rows.

Vividcortex aka SolarWinds DPM

We onboard this hierarchy to Vividcortex, a monitor that collects performance data from databases, and allows to see specific queries that execute slowly.
It can also help in determining possible improvements.

Vividcortex inventory for streaming. Normally Vividcortex does not run on all instances, but the primary and one pooled replica. We wanted a specific pooled replica in Frankfurt, though, so something with a 6000 number.

Our normal Vividcortex onboarding installs probes on the primary and one pooled replica, because it is not necessary to overwhelm the collection interface with all queries from all production machines.
A sample will do fine.

In this case, we want a replica in a specific location, though: only one data center behaves abnormally, so we would want one more machine within that location.
This required some bespoke puppet artistry, but it worked.
But, even then we do not get queries that are particularly interesting:

We get Query Count, and Average Latency.
But from the counts and the word average we can already see that this is not useful: we would have wanted to see high percentiles.
Also, the queries are all uninteresting.

Now, we believe most queries are fine, only some instances of queries that are mostly fine are taking unexpectedly long.
And we want to see those.

We can already see that the default view of VividCortex here is not helpful, and a quick exploration of the user interface quickly reveals that this tool is maybe not optimally useful for our specific hunt.

We go back to P_S and handcraft our stuff.

Plundering P_S

Let’s see what is on the menu:

[performance_schema]> show tables like '%statement%';
| Tables_in_performance_schema (%statement%) |
| events_statements_current |
| events_statements_histogram_by_digest |
| events_statements_histogram_global |
| events_statements_history |
| events_statements_history_long |
| events_statements_summary_by_account_by_event_name |
| events_statements_summary_by_digest |
| events_statements_summary_by_host_by_event_name |
| events_statements_summary_by_program |
| events_statements_summary_by_thread_by_event_name |
| events_statements_summary_by_user_by_event_name |
| events_statements_summary_global_by_event_name |
| prepared_statements_instances |
13 rows in set (0.00 sec)

I don’t know about you, but events_statements_summary_by_digest looks tasty by me.
What is in it?

[performance_schema]> desc events_statements_summary_by_digest;
| Field | Type | Null | Key | Default | Extra |
| SCHEMA_NAME | varchar(64) | YES | MUL | NULL | |
| DIGEST | varchar(64) | YES | | NULL | |
| DIGEST_TEXT | longtext | YES | | NULL | |
| COUNT_STAR | bigint unsigned | NO | | NULL | |
| SUM_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| MIN_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| AVG_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| MAX_TIMER_WAIT | bigint unsigned | NO | | NULL | |
| SUM_LOCK_TIME | bigint unsigned | NO | | NULL | |
| SUM_ERRORS | bigint unsigned | NO | | NULL | |
| SUM_WARNINGS | bigint unsigned | NO | | NULL | |
| SUM_ROWS_AFFECTED | bigint unsigned | NO | | NULL | |
| SUM_ROWS_SENT | bigint unsigned | NO | | NULL | |
| SUM_ROWS_EXAMINED | bigint unsigned | NO | | NULL | |
| SUM_CREATED_TMP_DISK_TABLES | bigint unsigned | NO | | NULL | |
| SUM_CREATED_TMP_TABLES | bigint unsigned | NO | | NULL | |
| SUM_SELECT_FULL_JOIN | bigint unsigned | NO | | NULL | |
| SUM_SELECT_FULL_RANGE_JOIN | bigint unsigned | NO | | NULL | |
| SUM_SELECT_RANGE | bigint unsigned | NO | | NULL | |
| SUM_SELECT_RANGE_CHECK | bigint unsigned | NO | | NULL | |
| SUM_SELECT_SCAN | bigint unsigned | NO | | NULL | |
| SUM_SORT_MERGE_PASSES | bigint unsigned | NO | | NULL | |
| SUM_SORT_RANGE | bigint unsigned | NO | | NULL | |
| SUM_SORT_ROWS | bigint unsigned | NO | | NULL | |
| SUM_SORT_SCAN | bigint unsigned | NO | | NULL | |
| SUM_NO_INDEX_USED | bigint unsigned | NO | | NULL | |
| SUM_NO_GOOD_INDEX_USED | bigint unsigned | NO | | NULL | |
| SUM_CPU_TIME | bigint unsigned | NO | | NULL | |
| COUNT_SECONDARY | bigint unsigned | NO | | NULL | |
| FIRST_SEEN | timestamp(6) | NO | | NULL | |
| LAST_SEEN | timestamp(6) | NO | | NULL | |
| QUANTILE_95 | bigint unsigned | NO | | NULL | |
| QUANTILE_99 | bigint unsigned | NO | | NULL | |
| QUANTILE_999 | bigint unsigned | NO | | NULL | |
| QUERY_SAMPLE_TEXT | longtext | YES | | NULL | |
| QUERY_SAMPLE_SEEN | timestamp(6) | NO | | NULL | |
| QUERY_SAMPLE_TIMER_WAIT | bigint unsigned | NO | | NULL | |
37 rows in set (0.00 sec)

Huh? What? The structure of P_S

At this point it is maybe a good idea to stop and establish a few facts about P_S.
P_S collects data about statement execution and server performance.

[performance_schema]> show tables like 'setup%';
| Tables_in_performance_schema (setup%) |
| setup_actors |
| setup_consumers |
| setup_instruments |
| setup_objects |
| setup_threads |

The server is instrumented for collection, and the data sources are called instruments.

We can ask an instrument to collect or not collect data for certain monitoring users, certain actors.
We can also ask the instruments ignore certain tables, schemas or other things, certain objects.
And again, the same, for certain threads.

Collected data is stored in preallocated ring-buffers, or added to certain aggregates.
All these things are consumers.

Configuration happens through the setup tables above, which set up the data flow from instrument through the filtering dimensions to the consumers.

Event data collection happens in event tables, inside a hierarchy, from transactions, to individual statements that make up a transaction, to execution phases of a statement, stages, to waits (mostly for IO or locks).
These things nest, but not necessarily on a 1:1 basis – a statement can contain waits, or other statements, for example.

root@streamingdb-6001 [performance_schema]> show tables like 'event%current';
| Tables_in_performance_schema (event%current) |
| events_stages_current |
| events_statements_current |
| events_transactions_current |
| events_waits_current |
4 rows in set (0.00 sec)

For each of these events, we have _current, _history and _history_long tables.
For example events_statements_current contains one entry for each active connection, events_statements_history the last few statements for each active connection, and events_statements_history_long the last few thousand statements across all connections.

There are other collections, about other aspects of the server, and more generalized statement aggregations, the summaries.

Queries and Query Digests

To be able to aggregate statements, there is the concept of a statements digest_text, and ultimately the digest, a hash number generated from the digest text.

So statements such as

select id from atable where id in ( 1, 2, 3)
SeLeCt id FROM atable where id IN (92929, 29292, 17654, 363562);

should be considered equivalent in an aggregate.
This is done by unparsing the statement from the parse tree, which gets rid of all the spacing and letter case differences in keywords.
During this, all constants and constant lists are replaced by ? or '?' respectively.

The digest text for the above two statements becomes

select id from atable where id in ( ? )

and the digest from that can then be generated by running a hash function over the digest text.

The disadvantage is that a digest cannot be explained with the EXPLAIN command, so we need to make sure to also keep a representative explainable version of the query.

Getting some data

In our case, events_statements_summary_by_digest is exactly what we want.
So we truncate the collection, and wait a bit, then ask.
The results are impossible:

[performance_schema]> truncate events_statements_summary_by_digest;
[performance_schema]> select  -> count_star, avg_timer_wait/1000000000 as avg_ms,
 -> QUANTILE_95/1000000000 as q95_ms,  -> first_seen,  -> last_seen,
 -> query_sample_text  -> from events_statements_summary_by_digest  -> where schema_name = 'schemaregistry'  -> and QUANTILE_95/1000000000 > 0.1
 -> order by QUANTILE_95/1000000000 asc \G
*************************** 1. row ***************************
 count_star: 21
 avg_ms: 0.0782
 q95_ms: 0.1202
 first_seen: 2022-09-19 14:35:28.885824
 last_seen: 2022-09-19 14:38:06.110111
query_sample_text: SELECT @@session.autocommit
*************************** 2. row ***************************
 count_star: 21
 avg_ms: 0.0902
 q95_ms: 0.1514
 first_seen: 2022-09-19 14:35:28.886215
 last_seen: 2022-09-19 14:38:06.110382

So we have completely internal configuration commands that sometimes take longer than 0.1ms to execute.
We have other instances of simple queries that sometimes take 14s to run, 1000x and more than the average.

And then…

Yeah, and that is as far as I got with my digging, when a colleague chimes in, pointing at the machine dashboard for the machine I am on.

A sick network interface is for sure messing with system performance.

One of the machines in the pooled replicas for this data center location is showing an elevated amount of network retransmits and the box probably needs some love from the data center operations engineers.

We experiment a bit by removing and re-adding the box to the pool, and sure enough: As soon as the system under test is in the pool the latencies are no longer production worthy.

The image from the title bar: End user experience with and without the broken box in the replica pool. One bad box spoils the experience for all the users.

So the root cause was not instances of one query executing badly, but all queries executing badly on one box of the pool, in one location.
Average query latencies, and even the P90 look good, but the P99 and the P99.9 go to hell.

The box is removed from the pool and a replacement has been scheduled.
The broken box will get a DCOE ticket.

Planet MySQL

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

The Surgeon Loop Knot is the fastest and easiest loop knot to tie. If you can tie an overhand knot, you can tie this knot. While it isn’t as neat and elegant as some other loop knots, it is a very strong loop knot that does not easily slip. You can use the Surgeon Loop Knot to make a loop at the end of your line for attaching weights or clips. Or as shown in this article, to make a loop to where you are attaching your hook or lure. While it is shown with a hook in this article, this knot excels for tying on lures and flies. The open loop gives whatever is tied to more action and movement. For just tying an empty loop just follow the same instructions without a hook.

Step 1

Run the line through the eye of the hook and bring the tag end of the line back along the mainline. Make sure your tag end of the line is long enough for the overhand loop to pass over the hook or lure.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot
The larger the hook or lure the longer the tag end should be

Step 2

Make an overhand loop using the doubled line and hook, make sure to keep the lines together and not twisted.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

Step 3

Pass the doubled line and hook through the overhand loop in the mainline one more time, again keeping this tidy is important for the final knot.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

Step 4

Moisten the knot and line then hold the hook or lure and the standing line and pull to tighten the knot. Make sure to adjust the loop size at this point. Once the knot has been snugged down trim the tag end to about a 1/4 inch. For fishing lures and jigs you want to have a loop about the size of an M&M, anything bigger and cause the lure to tangle.

Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot

The post Are You Nuts? Know your Fishing Knots! – The Surgeon Loop Knot appeared first on



A SingleStoreDB database driver for Laravel.

Last update

2022/09/19 23:22 (dev-main)


Last update

2022/09/19 23:22


  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0


Last update

2022/09/16 23:34


  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0


Last update

2022/08/24 22:11


  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0


Last update

2022/06/30 20:42


  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0


Last update

2022/06/09 20:48


  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0


Last update

2022/06/08 18:37


  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0


Last update

2022/05/31 20:31


  • php ^7.3|^8.0
  • illuminate/container ^8.0|^9.0
  • illuminate/database ^8.0|^9.0
  • illuminate/events ^8.0|^9.0
  • illuminate/support ^8.0|^9.0

Packalyst :: Latest Packages

Floppy Disk Factory

Floppy Disk Factory


It’s been a long time since we needed floppy disks to store data. But we still enjoyed watching this retro factory video posted by StirlingEngineering, which shows how they used to produce 3.5″ floppies. It’s a satisfying 5-minutes sequence of mechanical ear candy.

The Awesomer

How Not to Use MySQL

Chapter 9 of Efficient MySQL Performance changed in development. Originally, it was a chapter titled “Not MySQL”, as in “how not to use MySQL.” But we (O’Reilly and I) pulled the chapter, and the current chapter 9 in print is “Other Challenges”: an important laundry list of other challenges engineers using MySQL must be aware of and address. This blog post is a sketch of the unwritten chapter 9: how not to use MySQL.

Planet MySQL

Fire Extinguisher Ball

Fire Extinguisher Ball


Elide Fire makes this innovative firefighting device – a ball that can be tossed into a fire to instantly extinguish or reduce its intensity. The ball automatically deploys upon contact with a flame and smothers it with a blanket of non-toxic chemicals. It can also be placed in a high-risk location where it will activate in case of fire.

The Awesomer

Watch: Badass Chick-fil-A employee saves mother and baby as he chokes out their carjacker

Chick-fil-A is bland, overrated chicken. Being honest with ourselves is important. It’s generic fast food in desperate need of a few more herbs and/or spices. That doesn’t mean the legend of Chick-fil-A employees isn’t something to both praise and marvel at. They go above and beyond in serving their community. As opposed to lazy Mcdonald’s employees who expect $25 an hour to throw frozen meat in the microwave and never fix the shake machine.

The legend grows in Florida as one employee took a break from slinging barely seasoned nuggets to save a woman and her baby by choking the f*ck out of their would-be carjacker.

The punk’s name is William Branch. According to Whiskey Riff (who, like me, can’t wait for the new Koe Wetzel album on Friday), Branch tried to jack the car using a stick. This does not look like the face of the man who grasps the irony.

The woman screamed as Branch worked his way into her car. Upon hearing the cries of a damsel in distress, the mystery Chick-fil-A employee lept into action. He pulled Branch from the car, wrestled him to the ground, then put Branch in a chokehold until help arrived. It was a solid chokehold, too. If the anonymous Florida resident isn’t training at American Top Team yet, he should.

Branch is being charged with carjacking with a weapon (the stick Branch used) and battery. Our mystery Chick-fil-A employee remains just that, walking off into the horizon until he is called on again to save the day.

The Louder with Crowder Dot Com Website is on Instagram now! Follow us at @lwcnewswire and tell a friend!

Swedish Election PROVES The Left Is Engineering Racism! | Louder With Crowder

Louder With Crowder