https://media.notthebee.com/articles/6a4812fa91d086a4812fa91d09.jpg
Guys, the actual flyovers for the main event are happening on July 4th, but the rehearsals are just as amazing.
Not the Bee
Just another WordPress site
https://media.notthebee.com/articles/6a4812fa91d086a4812fa91d09.jpg
Guys, the actual flyovers for the main event are happening on July 4th, but the rehearsals are just as amazing.
Not the Bee
https://media.notthebee.com/articles/6a4565e7d3b046a4565e7d3b05.jpg
This might be one of the best videos on the internet right now.
Not the Bee
https://webyog.com/wp-content/uploads/2025/12/database_schema.png
There’s a narrative floating around tech circles that database administrators are being replaced — by the cloud,
by automation, by AI. It’s a story worth examining carefully, because the hiring data tells a different story entirely.
MySQL DBAs are not just surviving the industry’s transformation. Many are thriving, landing roles that pay six
figures and in some cases well beyond that. Here’s why the demand remains strong, what it takes to earn those
salaries, and how you can position yourself to get there.
MySQL isn’t a legacy technology quietly fading out. It powers some of the most demanding data infrastructure on
the planet. Meta’s social graph. YouTube’s video metadata and recommendation data. X’s real-time post and
engagement data. These platforms process billions of queries per day against MySQL-compatible systems, and
they employ MySQL specialists to keep those systems running.
Beyond hyperscalers, MySQL runs the backend of countless SaaS applications, e-commerce platforms,
healthcare systems, and financial services firms. The migration of many workloads to the cloud hasn’t eliminated
MySQL — it’s spread it further, with AWS RDS for MySQL, Google Cloud SQL, and Azure Database for MySQL
becoming mainstream deployment targets.
MySQL 8.0/8.4 LTS continues to be the dominant production choice for stability-focused organizations. The
MySQL 9.x Innovation series is attracting teams that want the latest query optimizer improvements
Let’s talk numbers.
These ranges vary by geography, industry, and company size, but the trend is consistent: MySQL expertise
commands premium compensation, and the gap between entry-level and senior is significant. The investment in
skill development has a clear payoff.
The volatility that hit some areas of tech — particularly front-end roles and certain generalist engineering
positions — has been less pronounced in data infrastructure. Databases are not optional. Every AI model needs
training data. Every transactional system needs a store of record. Every analytics pipeline needs clean,
queryable data. DBAs are foundational, not peripheral.
Employers hiring MySQL DBAs at the mid-to-senior level are looking for a specific combination of technical
depth and operational judgment. Here’s what that looks like in practice.
Understanding how MySQL is installed, configured, and tuned from the ground up. This includes server
parameters (innodb_buffer_pool_size, max_connections, binary log settings), MySQL’s file layout, and
the differences between MySQL deployment options — on-premises, cloud-managed, and containerized
environments like Docker and Kubernetes.
The ability to implement and audit MySQL’s privilege system — GRANT, REVOKE, role-based access control,
authentication plugin configuration, and SSL/TLS setup. In regulated industries (healthcare, finance), this often
means compliance documentation and regular access reviews.
A DBA who cannot confidently execute a point-in-time recovery is not production-ready. You need hands-on
experience with mysqldump, physical backup tools (Percona XtraBackup or MySQL Enterprise Backup), and
binary log-based recovery. You also need to have actually tested your restores — not just assumed they work.
The ability to read EXPLAIN output, identify missing indexes, resolve lock contention, and rewrite inefficient
queries is what DBAs get called in to do when things break. This skill grows over time and with exposure to
varied workloads, but it’s what separates a $75K DBA from a $130K one.
You need to know what healthy looks like before you can identify unhealthy. Proficiency with Performance
Schema, global status variables, slow query logs, and monitoring tools is expected at the mid-level and above.
Production MySQL almost always runs with replication. Understanding async replication, semi-sync, InnoDB
Cluster, and Group Replication — and being able to troubleshoot replication lag and failover scenarios — is
table stakes for senior roles.
Most MySQL DBAs follow a recognizable progression:
Junior DBA (Year 1–2): Learning fundamentals under supervision, executing well-defined tasks, building
familiarity with the tools and documentation.
Mid-Level DBA (Year 3–5): Taking ownership of production systems, handling incidents independently,
beginning to make architectural recommendations.
Senior DBA (Year 5+): Leading database strategy, mentoring junior team members, driving infrastructure
improvements, serving as the go-to person for complex problems.
Database Architect / Principal DBA: Designing data infrastructure for new products or migrations, setting
standards across engineering teams, often interfacing with executive stakeholders on data strategy.
Each step requires both technical depth and communication skills. The ability to explain a database problem and
its business impact to a non-technical audience becomes increasingly valuable as you advance.
Knowing the right tools is part of what makes a DBA effective and hirable.
SQLyog — The widely-used GUI client from Webyog for MySQL development and administration. The
Community Edition is free; the professional edition adds advanced features for power users. If you don’t have a
MySQL GUI in your toolkit, start here.
SQL Diagnostic Manager for MySQL — Webyog/IDERA’s professional monitoring platform. Used by MySQL
DBAs in production environments to get real-time visibility into:
Having hands-on familiarity with professional tooling puts you ahead of candidates who’ve only worked with
command-line tools.
One of the best things about MySQL as a career path is that the barrier to entry is low. Everything you need to
start learning is free:
There’s no certification exam required to get your first job (though Oracle’s MySQL certifications can help signal
competence). What matters is demonstrated skill — and you can build that on your laptop.
Most people reach functional competency in 6–12 months with consistent practice. Plan for 200+ hours of
hands-on work with real MySQL instances — not just reading. A portfolio of documented lab work (backup
scripts, performance tuning notes, replication setups) can substitute for years of experience when you’re
interviewing for a junior role.
No. Many successful DBAs came from sysadmin backgrounds, development roles, or entirely self-taught paths.
Employers care about what you can demonstrate. Hands-on capability beats a credential in most DBA hiring
conversations.
Yes. The fundamentals — indexing, query optimisation, backup strategy, access control, replication — translate
directly to PostgreSQL, MariaDB, and cloud-managed databases like Amazon Aurora. MySQL DBA experience
is a strong launchpad for a broader data infrastructure career.
More than most roles. Every AI model, every transactional system, every analytics pipeline depends on reliable
structured data storage. Cloud automation handles provisioning — it doesn’t handle query tuning, schema
design, incident response, or the judgment calls that keep production systems healthy. DBAs who stay current
with cloud deployments, containerisation, and modern HA patterns are well positioned.
Varied. Some days are routine — maintenance windows, development support, access reviews. Other days are
intense — a production incident at 2am, a replication failure, a query that’s taking down an entire application tier.
The ability to stay calm under pressure, diagnose systematically, and communicate clearly to non-technical
stakeholders is as important as technical skill.
Entry-level MySQL DBA roles typically start at $70,000–$85,000. To break into this range, you need
demonstrated competency with installation, backup/restore, security, and basic query optimisation. Use free
tools (MySQL Community Edition, SQLyog Community) to build that competency before you interview.
MySQL DBA skills translate directly into six-figure salaries, and the path to getting there is well-defined and
accessible. In 2026, with AI workloads increasing the volume and complexity of database operations, skilled
MySQL administrators are not becoming obsolete — they’re becoming more valuable.
Start with MySQL Community Edition. Use SQLyog to build practical workflow habits. Study the reference
manual. Build real systems and break them (in a lab environment). Join the Webyog community. Keep going.
The opportunity is real. The path is clear. What are you waiting for?
The tools that professionals use every day are available for you to try right now — at no cost.
Visit webyog.com — and start building the skills that land six-figure roles.
Planet for the MySQL Community
https://webyog.com/wp-content/uploads/2023/04/online-business-database_53876-95876.jpeg
Database administrators are the unsung architects of the modern internet. Every time a patient’s electronic
health record loads instantly, every time you check out a shopping cart without a hitch, every time a
recommendation engine surfaces exactly what you want — a database administrator made sure the system
could handle it. In 2026, that responsibility has grown larger, more complex, and more rewarding than ever.
If you’re wondering how to break into MySQL DBA work — or level up from a junior role — this guide maps out
the path clearly. MySQL 8.0/8.4 LTS and the MySQL 9.x Innovation series have expanded what DBAs need to
know, but the fundamentals remain the same. Let’s walk through them.
MySQL has consistently ranked among the top relational databases worldwide, trailing only Oracle in the
DB-Engines rankings — and it’s not slowing down. It powers massive global platforms — Meta’s social graph,
YouTube’s video metadata, countless SaaS applications, and a growing share of AI training and inference
pipelines that need fast, reliable structured data access.
Despite the rise of NoSQL systems and cloud-managed databases, MySQL expertise remains a hiring priority.
Cloud providers offering managed MySQL (Amazon RDS, Google Cloud SQL, Azure Database for MySQL)
have increased accessibility, but they haven’t reduced the need for skilled DBAs. Someone still needs to tune
queries, manage access controls, architect backup strategies, and respond when things go wrong. That
someone is you.
Before you invest months of learning, it helps to understand what the job looks like in practice. A typical MySQL
DBA’s day might include:
In 2026, many DBAs also find themselves involved in AI infrastructure work — maintaining databases that store
training datasets, feature stores, or model metadata. Familiarity with high-throughput ingestion patterns and
vector-adjacent storage has become a differentiator.
Start with the basics: install MySQL Community Edition on your local machine (or a free-tier cloud VM),
configure the server, and learn your way around the configuration file (my.cnf / my.ini). Understand the
difference between MySQL 8.0/8.4 LTS (the stable, long-term support branch) and MySQL 9.x Innovation
releases (feature-rich but faster-moving). Most production environments run LTS versions — that’s where your
hands-on practice should focus.
Security is non-negotiable. Learn the MySQL privilege system thoroughly:
In containerized and cloud environments, managing secrets and rotating credentials securely is as important as
the SQL syntax itself.
A DBA who can’t restore a database is a liability. Study:
Practice restores regularly. Many DBAs have discovered their backup strategy was broken only when they
needed it most.
Understanding how MySQL executes queries is what separates good DBAs from great ones. Learn to:
MySQL 8.4 and 9.x have added richer optimizer tracing and index skip scan capabilities — worth learning
alongside the fundamentals.
Most production MySQL environments use replication. Learn:
Cloud-managed services abstract some of this, but understanding what’s happening underneath makes you far
more effective when things go wrong.
You cannot manage what you cannot measure. Learn to query Performance Schema, watch global status
variables, and set up alerting for:
Familiarity with monitoring tools will accelerate your effectiveness immediately.
Many successful MySQL DBAs didn’t start there. Common transition paths include:
If you’re transitioning from sysadmin or dev work, you already have transferable skills — Linux administration,
scripting, networking, and version control all apply directly to DBA work.
Mentorship accelerates the path considerably. If you can find an experienced DBA to learn from — through a
job, a community forum, or an open-source project — take that opportunity. The gap between knowing the
commands and understanding the judgment calls comes from experience, and mentorship compresses that
timeline.
Learning MySQL’s command-line tools (mysql, mysqladmin, mysqldump, mysqlcheck) is foundational. As
you advance, dedicated tooling becomes essential.
SQL Diagnostic Manager for MySQL is the tool Webyog and IDERA offer for professional MySQL monitoring.
It provides:
For day-to-day query writing and database browsing, SQLyog (available as Community and paid editions) is a
widely-used GUI client that speeds up development and administration workflows significantly.
Expect 6–12 months of consistent study and hands-on practice to reach functional competency — enough to
take on a junior DBA role. Most practitioners report logging 200+ hours of practical work before feeling genuinely
confident handling production incidents.
The MySQL documentation is excellent and free. MySQL Community Edition gives you a full server to
experiment on at no cost. There’s no excuse not to start today.
The MySQL DBA path is well-documented, practically learnable, and professionally rewarding. In 2026, the
demand is strong and growing. The question isn’t whether the opportunity is there — it’s whether you’ll start.
No. Many successful DBAs come from sysadmin backgrounds, software development, or entirely self-taught
paths. Employers care about what you can demonstrate — not the credential on your resume. A portfolio of
hands-on work with real MySQL instances carries more weight than a certificate alone.
Most people reach functional competency — enough to handle a junior position — within 6–12 months of
consistent, hands-on practice. Budget for 200+ hours of real work: installing, configuring, breaking, and restoring
MySQL in a lab environment.
MySQL 8.4 LTS (Long-Term Support) is the stable, production-recommended track with multi-year security and
bug-fix support. MySQL 9.x Innovation releases ship new features faster but are not intended for long-term
production use. For learners, start with 8.4 LTS — it’s what most production environments run.
Absolutely. The fundamentals — indexing, query optimisation, backup strategy, access control, replication —
translate well to PostgreSQL, MariaDB, and cloud-managed databases like Amazon Aurora. MySQL DBA
experience is a strong foundation for a broader data infrastructure career.
Start with the MySQL command-line tools (mysql, mysqldump, mysqladmin). Add a GUI client like SQLyog
for day-to-day administration. As you advance, learn a professional monitoring platform — SQL Diagnostic
Manager for MySQL is widely used in production environments and worth familiarising yourself with early.
Yes. AI workloads increase — not decrease — the demand for reliable structured data storage. Model training
pipelines, feature stores, and inference logging all depend on databases. DBAs who understand high-throughput
ingestion, replication, and cloud deployments are well-positioned for the AI era.
Whether you’re just exploring the role or ready to accelerate your path to production, Webyog has the tools to
get you there faster.
Visit webyog.com and take the first step today.
Download the IDERA whitepaper “How to Become a MySQL DBA” for a deeper dive into the curriculum and career path. Available at webyog.com.
Curious what this career actually pays? Read our salary deep-dive: MySQL DBAs Are Landing Six-Figure Jobs
in 2026 — And You Can Too.
Planet for the MySQL Community
https://webyog.com/wp-content/uploads/2017/11/connections-and-buffer-pool-usage-1.png
As databases power increasingly complex workloads — from AI-driven applications to cloud-native
microservices and containerized deployments — the ability to monitor MySQL performance with precision has
never been more important. Whether you’re running MySQL 8.0/8.4 LTS in a stable production environment or
experimenting with the MySQL 9.x Innovation series, the fundamentals of connection management and buffer
pool tuning remain the bedrock of a healthy database.
This post, part of our ongoing MySQL monitoring series, dives into two critical areas: connection metrics and
the InnoDB buffer pool. Mastering these will help you catch problems before they become outages.
Modern infrastructure has raised the stakes for database performance. AI inference workloads generate
high-throughput, low-latency query patterns. Cloud deployments scale horizontally but introduce new failure
modes. Containerized MySQL instances — whether on Kubernetes or ECS — spin up and down rapidly, making
consistent monitoring essential.
The good news: MySQL’s built-in instrumentation is richer than ever. MySQL 8.4 LTS and the 9.x Innovation
releases ship with improved Performance Schema coverage, enhanced replication visibility, and better
diagnostics for connection errors. Knowing which metrics to watch — and what thresholds to act on — separates
reactive firefighting from proactive operations.
Every client request to MySQL passes through the connection manager thread. MySQL maintains a pool of
threads to handle these connections, and each active connection consumes memory and CPU. In
high-concurrency workloads — think e-commerce flash sales, real-time analytics pipelines, or AI feature stores
— connection pressure is one of the first things that breaks.
The default max_connections value is 151, which is appropriate for development but far too low for
production. Most production environments should set this to hundreds or even thousands, depending on
available RAM and workload patterns.
| Metric | What It Tells You |
| Threads_connected | Number of currently open connections |
| Threads_running | Connections actively executing queries (not idle) |
| Connections | Cumulative total connections since server start |
| Connection_errors_internal | Errors from internal server issues |
| Aborted_connects | Failed connection attempts |
| Aborted_clients | Failed connection attempts |
Threads_running is arguably the most important of these. A spike here — especially if it approaches
max_connections — signals that your server is under stress. If Threads_running climbs while
Threads_connected stays flat, you likely have slow queries piling up.
MySQL surfaces granular connection error counters that help you diagnose the root cause of failed connections:
The InnoDB buffer pool is MySQL’s most important memory structure. It caches table data and index pages in
RAM, reducing the need for expensive disk reads. A well-sized buffer pool can serve the majority of reads from
memory — dramatically reducing latency and I/O load.
The default buffer pool size is 128MB, which is a reasonable starting point for development. For dedicated
database servers, the best practice is to allocate approximately 80% of available RAM to the buffer pool.
The buffer pool size must align with this formula:
innodb_buffer_pool_size = N Ă— innodb_buffer_pool_chunk_size Ă— innodb_buffer_pool_instances
MySQL 8.x allows online resizing of the buffer pool, meaning you can adjust it without a restart — a significant
operational improvement. In cloud environments where instance sizes change frequently, this matters.
InnoDB uses a variant of the Least Recently Used (LRU) algorithm to manage which pages stay in the buffer
pool. Rather than a simple LRU list, MySQL uses a midpoint insertion strategy: newly loaded pages enter at
the midpoint of the list, not the head. This prevents large full-table scans from flushing your hot working set out of
the pool.
Two tuning parameters control this behavior:
For OLTP workloads with repeated access to the same rows, the defaults work well. For mixed workloads
running analytical queries alongside transactional ones — increasingly common as AI pipelines run batch
feature extraction alongside live serving — you may need to tune these values to protect your hot page set.
| Metric | What It Tells You |
| Innodb_buffer_pool_read_requests | Total logical read requests (memory hits + disk reads) |
| Innodb_buffer_pool_reads | Physical reads from disk (cache misses) |
| Innodb_buffer_pool_pages_total | Total pages in the buffer pool |
| Innodb_buffer_pool_pages_free | Pages currently available (not in use) |
Cache Miss Rate = (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests) Ă— 100
A healthy production system should have a cache miss rate below 1% — meaning 99%+ of reads are served
from memory. If your miss rate climbs above this threshold, your buffer pool is undersized for your working data
set.
Watch Innodb_buffer_pool_pages_free as well. A consistently low free page count means the pool is
under memory pressure, and MySQL is spending time evicting pages rather than serving data.
Manual monitoring with SHOW GLOBAL STATUS is a starting point, but it doesn’t scale. For teams running
MySQL in production — especially across multiple instances or cloud regions — a dedicated monitoring tool is essential.
SQL Diagnostic Manager for MySQL (part of the Webyog/IDERA family) provides real-time dashboards for all
the metrics discussed here, plus automated alerting, query analysis, and root cause diagnostics. Whether you’re
managing a single server or dozens of replicas behind a load balancer, having these metrics in one place makes
the difference between proactive tuning and reactive recovery.
Connection management and buffer pool sizing are foundational to MySQL performance. In 2026’s environment
of AI workloads, cloud scaling, and containerized deployments, these metrics deserve continuous attention —
not just one-time configuration.
Stay tuned for the next post in this series, where we cover InnoDB I/O metrics and query performance
diagnostics.
In most workloads, Threads_running should stay well below max_connections. If it consistently exceeds
20–30% of your connection limit, investigate slow queries or lock contention. A sudden spike often points to a
rogue query or a batch job gone wrong.
Check your cache miss rate: (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)
× 100. A value above 1% is a warning sign. Also watch Innodb_buffer_pool_pages_free — if free pages hover near zero, MySQL is under memory pressure and evicting data it needs.
Yes — MySQL 8.x supports online buffer pool resizing via SET GLOBAL innodb_buffer_pool_size. The
resize happens in chunks and may take a few seconds to minutes depending on pool size. No restart required.
Aborted clients typically indicate application-side connection leaks — connections opened but not properly
closed. Check your application’s connection pooling configuration and ensure connections are returned to the
pool after each operation.
Critical metrics like Threads_running and buffer pool efficiency should be monitored continuously with
alerting thresholds. Review trends weekly and investigate any sustained drift from your baseline.
Yes. These metrics are exposed in cloud-managed MySQL instances and most providers surface them in their
native monitoring dashboards. You can also connect SQL Diagnostic Manager for MySQL to RDS and Cloud
SQL instances for deeper analysis.
Stop flying blind on your MySQL performance. SQL Diagnostic Manager for MySQL gives you real-time
dashboards, automated alerting, and root cause analysis — covering every metric discussed in this post and
more.
Visit webyog.com to get started today.
Planet for the MySQL Community
Today we released RonDB 26.04.1, a beta release. It contains a
lot of new features, but the most interesting one is that RonSQL now
supports pushdown join aggregation and CTEs, so that complex queries run
with low, predictable latency.
RonDB has always been able to answer complex queries through a MySQL Server.
The problem with that path is predictability. The application asks for an
answer, but it has no guarantee about how fast that answer arrives: the MySQL
optimizer picks a plan that may or may not parallelise the query across the RonDB
data nodes, and a plan that looks fine on a small table can fall off a cliff as
the data grows.
RonSQL takes a different contract. The rule is simple:
Anything RonSQL accepts can be pushed down to the RonDB data nodes for
parallel execution.
If a query parses and plans in RonSQL, it runs as a parallel pushdown —
there is no fallback to a slow, single-threaded plan. That means the latency of a
complex query is something the application can actually reason about up front,
instead of discovering it in production.
RonSQL grew out of the needs of AI applications built on Feature
Stores, and in particular on-demand (real-time)
transformations in Hopsworks.
Traditionally an online Feature Store only does primary-key lookups. To keep
those lookups fast, every feature has to be pre-computed and
written back before serving. That works, but it has two costs:
RonSQL attacks both problems:
CTEs (Common Table Expressions, the SQL WITH clause) are what let
you combine these two ideas in a single, readable query: aggregate the fresh fact
rows in a CTE, then join the result against your normalised dimension tables.
Consider a fraud-scoring model. At inference time it needs a feature vector for
one card, computed over that card’s most recent activity. The raw transactions
arrive continuously and are inserted straight into RonDB:
-- Fact table: one row per card transaction, inserted in real time.
CREATE TABLE txn (
txn_id BIGINT NOT NULL,
cc_num BIGINT NOT NULL, -- card / account identifier
merchantkey INT NOT NULL, -- references merchant.m_merchantkey
amount INT NOT NULL, -- minor units (cents)
txn_time DATETIME(6) NOT NULL,
is_declined TINYINT NOT NULL,
PRIMARY KEY USING HASH (txn_id),
-- Ordered index: range-scan one card's recent activity cheaply.
INDEX idx_card_time (cc_num, txn_time)
) ENGINE=NDB
COMMENT='NDB_TABLE=TTL=604800@txn_time'; -- auto-expire rows after 7 days
-- Small dimension table: replaces a per-card Avro BLOB of merchant attributes.
CREATE TABLE merchant (
m_merchantkey INT NOT NULL,
m_category VARCHAR(16) NOT NULL,
m_risk_score INT NOT NULL,
PRIMARY KEY USING HASH (m_merchantkey)
) ENGINE=NDB;
The simplest on-demand feature is a scalar aggregate over the card’s last hour
of transactions. No pre-computation, no BLOB — just an index range scan that
includes whatever was inserted milliseconds ago:
SELECT
COUNT(*) AS txns_1h,
SUM(amount) AS amount_1h,
MAX(amount) AS max_amount_1h,
AVG(amount) AS avg_amount_1h,
SUM(CASE WHEN is_declined = 1 THEN 1 ELSE 0 END) AS declines_1h
FROM txn
WHERE cc_num = 4716253018273645
AND txn_time >= DATE_SUB('2026-06-29 14:30:00', INTERVAL 1 HOUR);
RonSQL turns the WHERE into an ordered-index range
scan on idx_card_time — it touches only this card’s
last hour — and pushes the COUNT/SUM/MAX/AVG
and the CASE expression down to the data nodes, which aggregate in
parallel and return a single row.
Now suppose the model wants spend broken down by merchant category.
The category does not live on the transaction — it lives on the
merchant dimension. The classic Feature Store approach would
denormalise the category into a packed BLOB per card. With RonSQL we keep the
data normalised and join at query time:
WITH spend_by_merchant AS (
SELECT merchantkey AS m,
SUM(amount) AS spend,
COUNT(*) AS txns
FROM txn
WHERE cc_num = 4716253018273645
AND txn_time >= DATE_SUB('2026-06-29 14:30:00', INTERVAL 1 HOUR)
GROUP BY merchantkey
)
SELECT m.m_category AS category,
SUM(spend_by_merchant.spend) AS spend_last_hour,
SUM(spend_by_merchant.txns) AS txns_last_hour
FROM merchant AS m
JOIN spend_by_merchant ON spend_by_merchant.m = m.m_merchantkey
GROUP BY m.m_category;
This query is easy to reason about, top to bottom:
spend_by_merchant runs anidx_card_time, restricted to one cardSUM(amount) and COUNT(*) grouped bymerchantkey, returning just a handful of rows (one per merchantm_merchantkey is the primary key of merchant, so eachmerchant is a small dimension table.m_category, producing one row per merchant category — aEvery stage is a pushdown, and stages such as the index scan and the lookups
run in parallel across the data nodes. We could even execute several CTEs in
parallel. Because RonSQL guarantees the whole thing pushes down, the latency is
bounded and predictable — which is exactly the contract a real-time
inference path needs.
RonSQL is reachable two ways:
rondb-cli shell sends a line straight to it withRONSQL prefix.ronsql_cli. A standalone client for scripting--execute,--execute-file, or stdin and can emit results as JSONTEXT.Both paths support EXPLAIN. Prefixing a query with
EXPLAIN shows the chosen pushdown plan — which index drives
each scan, which joins become lookups, and where the aggregation happens —
so “will this be fast?” is a question you answer before you
ship, not after.
RonSQL is a read-only, aggregation-focused SQL subset designed so that
everything it accepts can be pushed down:
SELECT only (plusEXPLAIN). No DDL/DML.WITH clausesINNER JOIN,LEFT [OUTER] JOIN, self-joins, and comma cross-joins over scalara.x = b.x AND a.y = b.y).WHERE —= <> < <= > >=, LIKE,IN (list), IS [NOT] NULL,AND/OR/XOR/NOT, arithmetic, bitwise ops, andCASE WHEN.EXISTS,IN (subquery), and scalar subqueries.COUNT(*), COUNT(expr),SUM, MIN, MAX, AVG.GROUP BYHAVING,ORDER BY ASC/DESC, LIMIT.CASE WHEN,GREATEST/LEAST, and date/time functionsDATE_ADD, DATE_SUB, EXTRACT,INTERVAL.FORCE INDEX,USE INDEX, IGNORE INDEX.Because the Feature Store has to compute the same feature in two very
different settings. Batch training and batch inference run on
engines like Spark SQL and DuckDB — both
batch query engines, chosen for different characteristics (Spark scales the work
across a cluster for very large datasets; DuckDB runs embedded and is hard to
beat on a single node for moderate data). Online serving runs on
RonSQL, computing the feature fresh at inference time. When all
of them speak SQL, the same feature logic can be expressed as the same query text
on each engine, which eliminates a notorious source of
training/serving skew — features that subtly differ
between the model’s training data and what it sees live at inference.
RonSQL is already useful, but there is a clear roadmap, much of it driven
directly by Feature Store needs:
COUNT(DISTINCT ...),DISTINCT and OFFSET more generally.STDDEV andVARIANCE (for z-score features), andGROUP_CONCAT.FROM, UNION, RIGHT/FULL OUTER
JOIN, and recursive CTEs for hierarchy/graph features.The core contribution stays the same: predictable low latency for
complex queries over fresh data, expressed in portable SQL —
exactly what an online Feature Store needs to serve fresh, skew-free features to
an AI model.
Planet for the MySQL Community
Ford executives said they’ve hired 350 veteran engineers — some of them former employees — after AI and automated systems failed to deliver the desired quality, reports TechCrunch:
Bloomberg reports the company’s chief operating officer Kumar Galhotra told journalists that Ford had been "relying more and more on automated quality systems" with disappointing results. So the company "brought back technical specialists," and those specialists "hunt for failure points before a part ever reaches the plant floor." Charles Poon, Ford’s vice president of vehicle hardware engineering, added, "Mistakenly we thought that by just introducing artificial intelligence and ingesting the design requirements that we had, that that would produce a high-quality product."
The article points out that Ford is using the rehired gray beard engineers to train younger staff — and, to reprogram its AI tools.
Read more of this story at Slashdot.
Slashdot
https://minervadb.com/wp-content/uploads/2026/06/shutterstock_2360159259-1024×520.jpg
MySQL query optimization is one of the most critical skills a database administrator or developer can possess. Whether you are managing a high-traffic e-commerce platform, a data warehouse with billions of rows, or a transactional OLTP system, poorly optimized queries are the leading cause of performance degradation, increased I/O, excessive CPU usage, and frustrated end users. At the heart of MySQL’s query optimization toolkit lies the EXPLAIN statement — a powerful diagnostic command that reveals how the MySQL query optimizer intends to execute a given SQL statement.
In this comprehensive guide, we will explore MySQL query optimization from the ground up: understanding the query execution lifecycle, dissecting every column of the EXPLAIN and EXPLAIN ANALYZE output, identifying common anti-patterns, and applying proven optimization strategies that MySQL DBAs and developers rely on in production environments every day. By the end of this article, you will be equipped with the knowledge to analyze execution plans, eliminate slow queries, and design indexes that drive maximum throughput.
Before diving into EXPLAIN, it is essential to understand what the MySQL query optimizer does. The optimizer is a cost-based component within the MySQL server that evaluates multiple possible execution plans for a given query and selects the one with the lowest estimated cost. This cost is calculated based on statistics about tables and indexes stored in the Information Schema and the InnoDB storage engine‘s internal data dictionary.
The optimizer considers factors such as row estimates, index selectivity, join order, and available access methods before producing an execution plan. However, the optimizer is not perfect — it relies on statistics that may be stale or inaccurate, which is why understanding EXPLAIN and knowing how to guide the optimizer with hints is an indispensable skill for any serious MySQL DBA or developer.
MySQL provides several variants of the EXPLAIN statement, each offering different levels of detail about query execution. Understanding when to use each variant is key to efficient query diagnostics.
-- Basic EXPLAIN EXPLAIN SELECT * FROM orders WHERE customer_id = 1001; -- EXPLAIN with FORMAT=JSON for richer, structured output EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE customer_id = 1001; -- EXPLAIN ANALYZE (MySQL 8.0.18+) - executes query and returns real metrics EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1001; -- EXPLAIN for DML statements EXPLAIN UPDATE orders SET status = 'shipped' WHERE order_date < '2024-01-01'; EXPLAIN DELETE FROM audit_log WHERE created_at < NOW() - INTERVAL 90 DAY; EXPLAIN INSERT INTO archive_orders SELECT * FROM orders WHERE status = 'closed';
Throughout this guide, we use a realistic e-commerce schema to demonstrate every optimization technique hands-on.
CREATE TABLE customers (
customer_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) NOT NULL,
country_code CHAR(2) NOT NULL,
created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
status TINYINT(1) NOT NULL DEFAULT 1,
UNIQUE KEY uk_email (email),
KEY idx_country_status (country_code, status),
KEY idx_created_at (created_at)
) ENGINE=InnoDB;
CREATE TABLE orders (
order_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
customer_id INT UNSIGNED NOT NULL,
order_date DATE NOT NULL,
total_amount DECIMAL(12,2) NOT NULL,
status ENUM('pending','processing','shipped','delivered','cancelled') NOT NULL,
KEY idx_customer_id (customer_id),
KEY idx_order_date_status (order_date, status),
CONSTRAINT fk_orders_customer FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
) ENGINE=InnoDB;
CREATE TABLE order_items (
item_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
order_id BIGINT UNSIGNED NOT NULL,
product_id INT UNSIGNED NOT NULL,
quantity SMALLINT UNSIGNED NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
KEY idx_order_id (order_id),
KEY idx_product_id (product_id),
CONSTRAINT fk_items_order FOREIGN KEY (order_id) REFERENCES orders(order_id)
) ENGINE=InnoDB;
CREATE TABLE products (
product_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
sku VARCHAR(50) NOT NULL,
category_id INT UNSIGNED NOT NULL,
price DECIMAL(10,2) NOT NULL,
stock_qty INT NOT NULL DEFAULT 0,
UNIQUE KEY uk_sku (sku),
KEY idx_category_id (category_id)
) ENGINE=InnoDB;
The id column represents the sequential identifier of each SELECT within the query. Simple queries have a single id of 1. Subqueries and unions produce multiple rows with different id values. Rows with the same id execute as a join; rows with higher id values represent inner subqueries executed before the outer query.
The select_type column describes the type of SELECT involved. Key values include: SIMPLE (no subqueries or unions), PRIMARY (the outermost SELECT), SUBQUERY (a subquery in SELECT or WHERE), DERIVED (a subquery in the FROM clause), UNION (subsequent SELECT in a UNION), and DEPENDENT SUBQUERY (a correlated subquery — a critical performance red flag indicating the subquery re-evaluates for each outer row).
-- SIMPLE: No subqueries or unions
EXPLAIN SELECT customer_id, email FROM customers WHERE country_code = 'US';
-- PRIMARY + SUBQUERY: Subquery in WHERE clause
EXPLAIN
SELECT order_id, total_amount FROM orders
WHERE customer_id IN (
SELECT customer_id FROM customers WHERE country_code = 'DE'
);
-- PRIMARY + DERIVED: Subquery in FROM clause (derived table)
EXPLAIN
SELECT d.country_code, COUNT(*) AS order_count
FROM (
SELECT c.country_code, o.order_id
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'delivered'
) d
GROUP BY d.country_code;
-- UNION: Multiple SELECT statements combined
EXPLAIN
SELECT customer_id, 'active' AS label FROM customers WHERE status = 1
UNION ALL
SELECT customer_id, 'inactive' AS label FROM customers WHERE status = 0;
The type column — also called the join type or access type — is the most important field in the entire EXPLAIN output. It tells you how MySQL accesses rows in a table. From best to worst performance:
-- const: Primary key lookup EXPLAIN SELECT * FROM customers WHERE customer_id = 42; -- type: const, rows: 1 -- eq_ref: Unique index join (best for joins) EXPLAIN SELECT c.email, o.order_id, o.total_amount FROM orders o JOIN customers c ON c.customer_id = o.customer_id WHERE o.order_date = '2024-06-01'; -- type for customers: eq_ref (primary key join) -- ref: Non-unique index lookup EXPLAIN SELECT order_id, order_date, status FROM orders WHERE customer_id = 1001; -- type: ref -- range: Index range scan EXPLAIN SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'; -- type: range -- ALL: Full table scan (must be fixed for large tables!) EXPLAIN SELECT * FROM orders WHERE total_amount > 5000; -- type: ALL if no index on total_amount -- Solution: CREATE INDEX idx_total_amount ON orders(total_amount);
The possible_keys column lists all indexes MySQL considered; key shows the index actually chosen. When key is NULL despite available indexes in possible_keys, MySQL chose a full table scan — often because statistics suggest too many rows match. Run ANALYZE TABLE to refresh statistics.
The key_len column shows how many bytes of the chosen index are used. For composite indexes, this reveals how many columns are utilized. The rows column is MySQL’s estimated row examination count — minimize this product across joined tables for optimal performance. The filtered percentage shows what fraction of rows examined actually pass the WHERE clause.
The Extra column contains the most actionable diagnostic signals: Using index (covering index — ideal), Using temporary (temp table — investigate), Using filesort (sort without index — add covering index), Using index condition (Index Condition Pushdown active — good), and Using MRR (Multi-Range Read active — good for range scans).
-- Using index: Covering index (zero table row access)
ALTER TABLE orders ADD INDEX idx_cust_covering
(customer_id, order_id, order_date, total_amount, status);
EXPLAIN
SELECT order_id, order_date, total_amount, status
FROM orders WHERE customer_id = 1001;
-- Extra: Using index
-- Using temporary + Using filesort: Performance red flag
EXPLAIN
SELECT country_code, COUNT(*) AS cnt
FROM customers GROUP BY country_code ORDER BY cnt DESC;
-- Fix: add index on (country_code) to avoid temp table
-- Using filesort on non-indexed ORDER BY
EXPLAIN SELECT order_id, total_amount FROM orders
ORDER BY total_amount DESC LIMIT 20;
-- Fix: CREATE INDEX idx_total_amount ON orders(total_amount);
-- Using index condition: ICP optimization
EXPLAIN SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
AND status = 'shipped';
-- Extra: Using index condition
EXPLAIN ANALYZE, introduced in MySQL 8.0.18, executes the query and returns both estimated and actual metrics for each node in the execution plan tree. This is critical for identifying cardinality estimation errors — cases where the optimizer’s row estimates diverge wildly from reality, leading to suboptimal plan selection.
EXPLAIN ANALYZE
SELECT
c.country_code,
COUNT(DISTINCT o.order_id) AS total_orders,
SUM(oi.unit_price * oi.quantity) AS total_revenue
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
WHERE c.status = 1
AND o.order_date >= '2024-01-01'
AND o.status = 'delivered'
GROUP BY c.country_code
ORDER BY total_revenue DESC;
-> Sort: total_revenue DESC (actual time=142.5..142.7 rows=48 loops=1)
-> Aggregate using temporary table (actual time=142.2..142.2 rows=48 loops=1)
-> Nested loop inner join (cost=18540.23 rows=9820)
(actual time=0.8..138.6 rows=87342 loops=1)
-> Nested loop inner join (cost=5421.12 rows=3240)
(actual time=0.5..22.4 rows=28918 loops=1)
-> Filter: (c.status = 1) (cost=1240.80 rows=8400)
(actual time=0.3..8.7 rows=71230 loops=1)
-> Index scan on c using idx_country_status
(cost=1240.80 rows=84000)
(actual time=0.2..6.9 rows=84000 loops=1)
-> Filter: (o.order_date >= '2024-01-01') and (o.status='delivered')
(cost=0.25 rows=1) (actual time=0.00019..0.00019 rows=0 loops=71230)
-> Index lookup on o using idx_customer_id
(customer_id=c.customer_id) (cost=0.25 rows=1)
(actual time=0.00017..0.00017 rows=1 loops=71230)
-> Index lookup on oi using idx_order_id (order_id=o.order_id)
(cost=1.12 rows=3) (actual time=0.003..0.004 rows=3 loops=28918)
Key analysis points: compare the estimated rows against actual rows. When these diverge by orders of magnitude, consider running ANALYZE TABLE or increasing innodb_stats_persistent_sample_pages. The actual time=start..end values are in milliseconds. The loops value shows how many times each node executed — high loop counts on expensive inner operations are the primary target for optimization.
Wrapping an indexed column inside a function prevents MySQL from using the index, forcing a full table scan. This is one of the most common and damaging anti-patterns found in production SQL workloads — and the fix is almost always straightforward.
-- BAD: Function prevents index usage
EXPLAIN SELECT * FROM orders
WHERE YEAR(order_date) = 2024 AND MONTH(order_date) = 6;
-- type: ALL (full table scan on potentially millions of rows)
-- GOOD: Rewrite as range condition (uses index)
EXPLAIN SELECT * FROM orders
WHERE order_date >= '2024-06-01' AND order_date < '2024-07-01';
-- type: range, Extra: Using index condition
-- BAD: LIKE with leading wildcard (no index possible)
EXPLAIN SELECT * FROM products WHERE sku LIKE '%ABC%';
-- Consider FULLTEXT index for arbitrary substring searches
ALTER TABLE products ADD FULLTEXT INDEX ft_sku (sku);
SELECT * FROM products WHERE MATCH(sku) AGAINST('ABC' IN BOOLEAN MODE);
-- GOOD: LIKE with trailing wildcard (uses index prefix scan)
EXPLAIN SELECT * FROM products WHERE sku LIKE 'ABC%';
-- type: range
-- BAD: Function on indexed column breaks index usage
EXPLAIN SELECT * FROM customers WHERE LOWER(email) = 'user@example.com';
-- GOOD: Functional index (MySQL 8.0+) preserves index access
ALTER TABLE customers ADD INDEX idx_email_lower ((LOWER(email)));
EXPLAIN SELECT * FROM customers WHERE LOWER(email) = 'user@example.com';
-- type: ref, key: idx_email_lower
The N+1 problem occurs when an application executes one query to retrieve N records and then fires an additional query for each record — N+1 total round trips. This is catastrophic at scale and entirely preventable with proper JOIN usage or batch fetching.
-- BAD: N+1 pattern (500 pending orders = 501 queries!)
-- Query 1: SELECT order_id FROM orders WHERE status = 'pending';
-- Then for each order_id:
-- Queries 2..501: SELECT * FROM order_items WHERE order_id = ?;
-- GOOD: Single JOIN eliminates N+1 completely
EXPLAIN
SELECT
o.order_id, o.order_date, o.total_amount,
oi.item_id, oi.product_id, oi.quantity, oi.unit_price
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'pending'
ORDER BY o.order_id, oi.item_id;
-- type for orders: ref (idx_status)
-- type for order_items: ref (idx_order_id)
-- One query, complete result set
Using SELECT * prevents covering index usage, transfers unnecessary data across the network, and makes execution plans less predictable as schemas evolve. Always project only the columns your application actually needs.
-- BAD: SELECT * forces table row access even when index could cover query
EXPLAIN SELECT * FROM orders WHERE customer_id = 1001;
-- GOOD: Project only needed columns enables covering index
ALTER TABLE orders ADD INDEX idx_cust_cover
(customer_id, order_id, order_date, total_amount, status);
EXPLAIN
SELECT order_id, order_date, total_amount, status
FROM orders WHERE customer_id = 1001;
-- type: ref, Extra: Using index (all data from index - zero table access)
Composite indexes follow the left-prefix rule: MySQL can only use an index starting from the leftmost column. A composite index on (A, B, C) supports queries on A, A+B, or A+B+C — but not B or C alone. Design composite indexes with equality columns first, range condition columns second, and ORDER BY / GROUP BY columns last to eliminate filesort operations.
-- Query: WHERE status = 'shipped' AND order_date BETWEEN x AND y ORDER BY order_date
-- Optimal: equality first, range second, ORDER BY aligned with range column
ALTER TABLE orders ADD INDEX idx_status_date_opt (status, order_date);
EXPLAIN
SELECT order_id, customer_id, total_amount
FROM orders
WHERE status = 'shipped'
AND order_date BETWEEN '2024-01-01' AND '2024-06-30'
ORDER BY order_date;
-- type: range, key: idx_status_date_opt
-- Extra: Using index condition (NO filesort! ORDER BY uses index)
-- Verify index columns being used via key_len
-- status ENUM NOT NULL = 1 byte
-- order_date DATE NOT NULL = 3 bytes
-- key_len = 4 means BOTH columns are utilized
-- Covering composite index for aggregate queries
ALTER TABLE orders ADD INDEX idx_grp_covering
(status, order_date, customer_id, total_amount);
EXPLAIN
SELECT status, order_date, COUNT(*) AS cnt, SUM(total_amount) AS revenue
FROM orders
WHERE status IN ('shipped', 'delivered')
AND order_date >= '2024-01-01'
GROUP BY status, order_date;
-- Extra: Using index (full covering index - no table access whatsoever)
MySQL 8.0 introduced invisible indexes, which the optimizer ignores while InnoDB continues maintaining them. This allows DBAs to safely validate the impact of removing an index before permanently dropping it — an indispensable tool for production index lifecycle management.
-- Make an index invisible to test impact of removing it ALTER TABLE orders ALTER INDEX idx_status INVISIBLE; -- EXPLAIN now shows optimizer ignoring this index EXPLAIN SELECT * FROM orders WHERE status = 'pending'; -- possible_keys: NULL (invisible index ignored) -- Re-enable the index ALTER TABLE orders ALTER INDEX idx_status VISIBLE; -- Allow session to see invisible indexes for targeted testing SET SESSION optimizer_switch = 'use_invisible_indexes=on'; EXPLAIN SELECT * FROM orders WHERE status = 'pending'; SET SESSION optimizer_switch = 'use_invisible_indexes=off'; -- Check visibility status of all indexes SELECT index_name, is_visible FROM information_schema.STATISTICS WHERE table_schema = 'ecommerce' AND table_name = 'orders' GROUP BY index_name, is_visible;
When the MySQL optimizer makes a poor index selection — often due to outdated statistics or unusual data distributions — index hints and optimizer hints allow targeted intervention. Use them sparingly and always validate with EXPLAIN, as they bypass the optimizer’s cost model.
-- FORCE INDEX: Optimizer must use this index (ignores all others)
EXPLAIN SELECT * FROM orders FORCE INDEX (idx_order_date_status)
WHERE order_date >= '2024-01-01' AND status = 'delivered';
-- USE INDEX: Suggests an index (optimizer may still ignore)
EXPLAIN SELECT * FROM orders USE INDEX (idx_customer_id)
WHERE customer_id = 1001;
-- IGNORE INDEX: Prevents use of a specific index
EXPLAIN SELECT * FROM orders IGNORE INDEX (idx_status)
WHERE status = 'pending' AND order_date >= '2024-01-01';
-- Optimizer hints (MySQL 8.0+ preferred method)
SELECT /*+ NO_HASH_JOIN(o, c) */
o.order_id, c.email, o.total_amount
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
WHERE o.status = 'pending';
-- SET_VAR hint: Change variable scope for a single query
SELECT /*+ SET_VAR(sort_buffer_size=4194304) */
customer_id, SUM(total_amount) AS revenue
FROM orders
GROUP BY customer_id
ORDER BY revenue DESC
LIMIT 100;
Subqueries can be highly efficient or devastating for performance depending on how they are written. The most dangerous anti-pattern is the correlated subquery — a subquery with a DEPENDENT SUBQUERY select_type that re-evaluates for every row of the outer query. MySQL 8.0’s Common Table Expressions (CTEs) provide both performance parity with JOINs and dramatically improved readability for complex multi-step queries.
-- BAD: Correlated subquery (re-evaluated N times for N outer rows)
EXPLAIN
SELECT o.order_id, o.total_amount,
(SELECT SUM(oi.unit_price * oi.quantity)
FROM order_items oi
WHERE oi.order_id = o.order_id) AS calculated_total
FROM orders o
WHERE o.order_date >= '2024-01-01';
-- select_type: DEPENDENT SUBQUERY (executed once per outer row!)
-- GOOD: JOIN with aggregation (single pass over data)
EXPLAIN
SELECT o.order_id, o.total_amount, oi_agg.calculated_total
FROM orders o
JOIN (
SELECT order_id, SUM(unit_price * quantity) AS calculated_total
FROM order_items GROUP BY order_id
) oi_agg ON oi_agg.order_id = o.order_id
WHERE o.order_date >= '2024-01-01';
-- BEST: CTE for readability with equivalent performance (MySQL 8.0+)
WITH order_totals AS (
SELECT order_id, SUM(unit_price * quantity) AS calculated_total
FROM order_items GROUP BY order_id
)
SELECT o.order_id, o.total_amount, ot.calculated_total
FROM orders o
JOIN order_totals ot ON ot.order_id = o.order_id
WHERE o.order_date >= '2024-01-01';
-- Recursive CTE: Hierarchical queries (category trees, org charts)
WITH RECURSIVE category_tree AS (
SELECT category_id, parent_id, name, 0 AS depth
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.category_id, c.parent_id, c.name, ct.depth + 1
FROM categories c
JOIN category_tree ct ON ct.category_id = c.parent_id
)
SELECT category_id, CONCAT(REPEAT(' ', depth), name) AS indented_name
FROM category_tree ORDER BY category_id;
Naive pagination using high OFFSET values is a classic performance trap. As OFFSET grows, MySQL must scan and discard increasingly large numbers of rows before returning the requested page — a problem known as deep pagination. For large datasets, cursor-based pagination using the last seen primary key delivers constant-time performance regardless of page depth.
-- BAD: High offset forces full scan of 1,000,100 rows
EXPLAIN SELECT order_id, order_date, total_amount
FROM orders ORDER BY order_id
LIMIT 100 OFFSET 1000000;
-- rows: 1000100 (scans and discards 1,000,000 rows)
-- GOOD: Cursor-based (keyset) pagination - constant performance
-- First page:
SELECT order_id, order_date, total_amount
FROM orders WHERE order_id > 0
ORDER BY order_id LIMIT 100;
-- Next page (pass last_order_id from previous result set):
SELECT order_id, order_date, total_amount
FROM orders
WHERE order_id > :last_order_id
ORDER BY order_id LIMIT 100;
-- type: range, rows: 100 (reads exactly what is needed)
-- Alternative: Late row lookup for complex multi-column sort
SELECT o.*
FROM orders o
JOIN (
SELECT order_id FROM orders
ORDER BY total_amount DESC, order_id
LIMIT 100 OFFSET 50000
) ids ON ids.order_id = o.order_id
ORDER BY o.total_amount DESC, o.order_id;
-- Inner query works only with index pages; outer fetches only 100 full rows
The MySQL optimizer’s decisions are only as good as the statistics it uses. Stale or inaccurate statistics lead to poor plan choices — wrong join orders, missed index usage, and cardinality estimation errors. As a MySQL DBA, proactively managing statistics is a core operational responsibility, especially after bulk data loads or large DELETE operations.
-- Refresh table statistics
ANALYZE TABLE orders, customers, order_items, products;
-- View table statistics and sizes
SELECT table_name,
table_rows,
ROUND(data_length / 1024 / 1024, 2) AS data_mb,
ROUND(index_length / 1024 / 1024, 2) AS index_mb,
update_time
FROM information_schema.TABLES
WHERE table_schema = 'ecommerce'
ORDER BY data_length DESC;
-- Check index cardinality (higher = more selective = better)
SELECT index_name, column_name, seq_in_index, cardinality, nullable
FROM information_schema.STATISTICS
WHERE table_schema = 'ecommerce' AND table_name = 'orders'
ORDER BY index_name, seq_in_index;
-- Increase sample pages for better statistics on large tables
ALTER TABLE orders STATS_SAMPLE_PAGES = 50;
ANALYZE TABLE orders;
-- InnoDB persistent statistics settings
SHOW VARIABLES LIKE 'innodb_stats%';
-- innodb_stats_persistent = ON (recommended for production)
-- innodb_stats_persistent_sample_pages = 20 (increase for accuracy)
-- Check when InnoDB table statistics were last updated
SELECT * FROM mysql.innodb_table_stats
WHERE database_name = 'ecommerce';
MySQL’s Performance Schema provides comprehensive instrumentation tables for real-time query performance monitoring. For MySQL DBAs, mastering the Performance Schema is essential for identifying the highest-impact optimization targets in production — revealing far more than the slow query log alone.
-- Top 10 slowest queries by total execution time
SELECT
DIGEST_TEXT AS query_template,
COUNT_STAR AS exec_count,
ROUND(SUM_TIMER_WAIT / 1e12, 3) AS total_time_sec,
ROUND(AVG_TIMER_WAIT / 1e12, 6) AS avg_time_sec,
ROUND(MAX_TIMER_WAIT / 1e12, 6) AS max_time_sec,
SUM_ROWS_EXAMINED AS total_rows_examined,
ROUND(SUM_ROWS_EXAMINED / COUNT_STAR, 0) AS avg_rows_examined,
SUM_NO_INDEX_USED AS full_scans
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'ecommerce'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
-- Queries performing full table scans in production
SELECT
DIGEST_TEXT,
COUNT_STAR,
SUM_NO_INDEX_USED,
ROUND(AVG_TIMER_WAIT / 1e12, 6) AS avg_sec
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME = 'ecommerce' AND SUM_NO_INDEX_USED > 0
ORDER BY SUM_NO_INDEX_USED DESC LIMIT 10;
-- sys schema: Simplified top-level performance view
SELECT * FROM sys.statement_analysis
WHERE db = 'ecommerce'
ORDER BY total_latency DESC LIMIT 10;
-- sys schema: All queries doing full table scans
SELECT * FROM sys.statements_with_full_table_scans
WHERE db = 'ecommerce'
ORDER BY no_index_used_count DESC;
When EXPLAIN and EXPLAIN ANALYZE do not provide sufficient insight, the Optimizer Trace delivers a complete JSON log of every decision the optimizer made — including all alternative plans considered and their cost estimates. This is the ultimate diagnostic instrument for resolving the most difficult query optimization problems.
-- Enable optimizer trace SET SESSION optimizer_trace = 'enabled=on'; SET SESSION optimizer_trace_max_mem_size = 1048576; -- Run the query to analyze SELECT order_id, customer_id, total_amount FROM orders WHERE status = 'shipped' AND order_date BETWEEN '2024-01-01' AND '2024-06-30' ORDER BY total_amount DESC LIMIT 50; -- Retrieve the trace (JSON format) SELECT QUERY, TRACE FROM information_schema.OPTIMIZER_TRACE\G -- Key JSON sections to examine: -- "considered_execution_plans": All plans evaluated -- "best_access_path": Index chosen and why -- "rows_estimation": Cardinality estimates per table -- "cost_info": read_cost, eval_cost, prefix_cost per plan -- Disable optimizer trace SET SESSION optimizer_trace = 'enabled=off';
Beyond index design, several MySQL server variables directly influence query execution performance. Understanding and tuning these variables is a critical complement to query-level optimization in production environments.
-- Sort buffer: used when ORDER BY/GROUP BY cannot use an index SHOW VARIABLES LIKE 'sort_buffer_size'; -- Default: 256KB SET SESSION sort_buffer_size = 4 * 1024 * 1024; -- 4MB for heavy sorts -- Join buffer: used for Block Nested Loop joins (non-indexed joins) SHOW VARIABLES LIKE 'join_buffer_size'; -- Default: 256KB SET SESSION join_buffer_size = 2 * 1024 * 1024; -- 2MB for large joins -- Temporary table memory thresholds (exceeding causes disk spill) SHOW VARIABLES LIKE 'tmp_table_size'; -- Default: 16MB SHOW VARIABLES LIKE 'max_heap_table_size'; -- Default: 16MB -- Set both equal to prevent disk-based temp tables -- InnoDB buffer pool: the single most impactful performance variable SHOW VARIABLES LIKE 'innodb_buffer_pool_size'; -- Target: 70-80% of total RAM -- Enable slow query log for continuous production monitoring SET GLOBAL slow_query_log = ON; SET GLOBAL long_query_time = 1; -- Capture queries > 1 second SET GLOBAL log_queries_not_using_indexes = ON; -- Capture queries without indexes SHOW VARIABLES LIKE 'slow_query_log_file'; -- Check log file location -- Read buffer: sequential scan performance SHOW VARIABLES LIKE 'read_buffer_size'; -- Default: 128KB SHOW VARIABLES LIKE 'read_rnd_buffer_size'; -- Default: 256KB
The following checklist provides a systematic approach to diagnosing and resolving slow queries in MySQL production environments. Apply these steps in order for every optimization engagement.
performance_schema.events_statements_summary_by_digest, or sys.statement_analysis to identify the highest-impact queries by total execution time and examination count.type (eliminate ALL and index scans), rows (minimize the cross-join product), and Extra (eliminate Using filesort and Using temporary where feasible).ANALYZE TABLE after bulk data changes to ensure the optimizer works with accurate cardinality estimates.
MySQL query optimization is both a science and an art. The science lies in understanding how the cost-based optimizer works, how indexes are structured and accessed internally by InnoDB, and how to interpret every field of the EXPLAIN and EXPLAIN ANALYZE output with precision. The art lies in applying this knowledge pragmatically — knowing when to add a composite index, when to rewrite a correlated subquery as a JOIN, when to refresh statistics, and when to override the optimizer with targeted hints.
Mastering the techniques in this guide — from dissecting EXPLAIN columns and eliminating full table scans, to designing optimal composite and covering indexes, avoiding deep pagination traps, leveraging invisible indexes for safe lifecycle management, and using the Performance Schema for continuous monitoring — equips you to build MySQL-backed systems that scale confidently to hundreds of millions of rows and thousands of concurrent connections.
The return on investment in MySQL query optimization skills is exceptional: reduced infrastructure costs, dramatically improved user experience, fewer on-call incidents, and a more resilient, predictable database tier. Every millisecond shaved from a high-frequency query executed millions of times daily translates directly into meaningful savings and competitive advantage. Start every optimization engagement with EXPLAIN, follow the evidence rigorously, and let the data guide every decision you make.
Planet for the MySQL Community
https://blog.holoviz.org/posts/panel_live_server/images/panel-live-server.pngPlanet Python
https://media.notthebee.com/articles/6a43fd453533d6a43fd453533e.jpg
A prankster at East Brook Middle School in Paramus, New Jersey, just pulled off what might be the most insane yearbook prank you’ll ever see.
Not the Bee