Some current MySQL Architecture writings

So, I’ve been looking around for a while (and a few times now) for any good resources that cover a bunch of MySQL architecture and technical details aimed towards the technically proficient but not MySQL literate audience. I haven’t really found anything. I mean, there’s the (huge and very detailed) MySQL manual, there’s the MySQL Internals manual (which is sometimes only 10 years out of date) and there’s various blog entries around the place. So I thought I’d write something explaining roughly how it all fits together and what it does to your system (processes, threads, IO etc).(Basically, I’ve found myself explaining this enough times in the past few years that I should really write it down and just point people to my blog). I’ve linked to things for more reading. You should probably read them at some point.
Years ago, there were many presentations on MySQL Architecture. I went to try and find some on YouTube and couldn’t. We were probably not cool enough for YouTube and the conferences mostly weren’t recorded. So, instead, I’ll just link to Brian on NoSQL – because it’s important to cover NoSQL as well.
So, here is a quick overview of executing a query inside a MySQL Server and all the things that can affect it. This isn’t meant to be complete… just a “brief” overview (of a few thousand words).
MySQL is an open source relational database server, the origins of which date back to 1979 with MySQL 1.0 coming into existence in 1995. It’s code that has some history and sometimes this really does show. For a more complete history, see my talk from linux.conf.au 2014: Past, Present and Future of MySQL (YouTube, Download).
At least of writing, everything here applies to MariaDB and Percona Server too.
The MySQL Server runs as a daemon (mysqld). Users typically interact with it over a TCP or UNIX domain socket through the MySQL network protocol (of which multiple implementations exist under various licenses). Each connection causes the MySQL Server (mysqld) to spawn a thread to handle the client connection.
There are now several different thread-pool plugins that instead of using one thread per connection, multiplex connections over a set of threads. However, these plugins are not commonly deployed and we can largely ignore them. For all intents and purposes, the MySQL Server spawns one thread per connection and that thread alone performs everything needed to service that connection. Thus, parallelism in the MySQL Server is gotten from executing many concurrent queries rather than one query concurrently.
The MySQL Server will cache threads (the amount is configurable) so that it doesn’t have to have the overhead of pthread_create() for each new connection. This is controlled by the thread_cache_size configuration option. It turns out that although creating threads may be a relatively cheap operation, it’s actually quite time consuming in the scope of many typical MySQL Server connections.
Because the MySQL Server is a collection of threads, there’s going to be thread local data (e.g. connection specific) and shared data (e.g. cache of on disk data). This means mutexes and atomic variables. Most of the more advanced ways of doing concurrency haven’t yet made it into MySQL (e.g. RCU hasn’t yet and is pretty much needed to get 1 million TPS), so you’re most likely going to see mutex contention and contention on cache lines for atomic operations and shared data structures.
There are also various worker threads inside the MySQL Server that perform various functions (e.g. replication).
Until sometime in the 2000s, more than one CPU core was really uncommon, so the fact that there were many global mutexes in MySQL wasn’t really an issue. These days, now that we have more reliable async networking and disk IO system calls but MySQL has a long history, there’s global mutexes still and there’s no hard and fast rule about how it does IO.
Over the past 10 years of MySQL development, it’s been a fight to remove the reliance on global mutexes and data structures controlled by them to attempt to increase the number of CPU cores a single mysqld could realistically use. The good news is that it’s no longer going to max out on the number of CPU cores you have in your phone.
So, you have a MySQL Client (e.g. the mysql client or something) connecting to the MySQL Server. Now, you want to enter a query. So you do that, say “SELECT 1;”. The query is sent to the server where it is parsed, optimized, executed and the result returns to the client.
Now, you’d expect this whole process to be incredibly clean and modular, like you were taught things happened back in university with several black boxes that take clean input and produce clean output that’s all independent data structures. At least in the case of MySQL, this isn’t really the case. For over a decade there’s been lovely architecture diagrams with clean boxes – the code is not like this at all. But this probably only worries you once you’re delving into the source.
The parser is a standard yacc one – there’s been attempts to replace it over the years, none of which have stuck – so we have the butchered yacc one still. With MySQL 5.0, it exploded in size due to the addition of things like SQL2003 stored procedures and it is of common opinion that it’s rather bloated and was better in 4.1 and before for the majority of queries that large scale web peeps execute.
There is also this thing called the Query Cache – protected by a single global mutex. It made sense in 2001 for a single benchmark. It is a simple hash of the SQL statement coming over the wire to the exact result to send(2) over the socket back to a client. On a single CPU system where you ask the exact same query again and again without modifying the data it’s the best thing ever. If this is your production setup, you probably want to think about where you went wrong in your life. On modern systems, enabling the query cache can drop server performance by an order of magnitude. A single global lock is a really bad idea. The query cache should be killed with fire – but at least in the mean time, it can be disabled.
Normally, you just have the SQL progress through the whole process of parse, optimize, execute, results but the MySQL Server also supports prepared statements. A prepared statement is simply this: “Hi server, please prepare this statement for execution leaving the following values blank: X, Y and Z” followed by “now execute that query with X=foo, Y=bar and Z=42″. You can call execute many times with different values. Due to the previously mentioned not-quite-well-separated parse, optimize, execute steps, prepared statements in MySQL aren’t as awesome as in other relational databases. You basically end up saving parsing the query again when you execute it with new parameters. More on prepared statements (from 2006) here. Unless you’re executing the same query many times in a single connection, server side prepared statements aren’t worth the network round trips.
The absolute worst thing in the entire world is MySQL server side prepared statements. It moves server memory allocation to be the responsibility of the clients. This is just brain dead stupid and a reason enough to disable prepared statements. In fact, just about every MySQL client library for every programming language ever actually fakes prepared statements in the client rather than trust every $language programmer to remember to close their prepared statements. Open many client connections to a MySQL Server and prepare a lot of statements and watch the OOM killer help you with your DoS attack.
So now that we’ve connected to the server, parsed the query (or done a prepared statement), we’re into the optimizer. The optimizer looks at a data structure describing the query and works out how to execute it. Remember: SQL is declarative, not procedural. The optimizer will access various table and index statistics in order to work out an execution plan. It may not be the best execution plan, but it’s one that can be found within reasonable time. You can find out the query plan for a SELECT statement by prepending it with EXPLAIN.
The MySQL optimizer is not the be all and end all of SQL optimizers  (far from it). A lot of MySQL performance problems are due to complex SQL queries that don’t play well with the optimizer, and there’s been various tricks over the years to work around deficiencies in it. If there’s one thing the MySQL optimizer does well it’s making quick, pretty good decisions about simple queries. This is why MySQL is so popular – fast execution of simple queries.
To get table and index statistics, the optimizer has to ask the Storage Engine(s). In MySQL, the actual storage of tables (and thus the rows in tables) is (mostly) abstracted away from the upper layers. Much like a VFS layer in an OS kernel, there is (for some definition) an API abstracting away the specifics of storage from the rest of the server. The API is not clean and there are a million and one layering violations and exceptions to every rule. Sorry, not my fault.
Table definitions are in FRM files on disk, entirely managed by MySQL (not the storage engines) and for your own sanity you should not ever look into the actual file format. Table definitions are also cached by MySQL to save having to open and parse a file.
Originally, there was MyISAM (well, and ISAM before it, but that’s irrelevant now). MyISAM was non-transactional but relatively fast, especially for read heavy workloads. It only allowed one writer although there could be many concurrent readers. MyISAM is still there and used for system tables. The current default storage engine is called InnoDB. It’s all the buzzwords like ACID and MVCC. Just about every production environment is going to be using InnoDB. MyISAM is effectively deprecated.
InnoDB originally was its own independent thing and has (to some degree) been maintained as if it kind of was. It is, however, not buildable outside a MySQL Server anymore. It also has its own scalability issues. A recent victory was splitting the kernel_mutex, which was a mutex that protected far too much internal InnoDB state and could be a real bottleneck where NRCPUs > 4.
So, back to query execution. Once the optimizer has worked out how to execute the query, MySQL will start executing it. This probably involves accessing some database tables. These are probably going to be InnoDB tables. So, MySQL (server side) will open the tables, looking up the MySQL Server table definition cache and creating a MySQL Server side table share object which is shared amongst the open table instances for that table. See here for scalability hints on these (from 2009). The opened table objects are also cached – table_open_cache. In MySQL 5.6, there is table_open_cache_instances, which splits the table_open_cache mutex into table_open_cache_instances mutexes to help reduce lock contention on machines with many CPU cores (> 8 or >16 cores, depending on workload).
Once tables are opened, there are various access methods that can be employed. Table scans are the worst (start at the start and examine every row). There’s also index scans (often seeking to part of the index first) and key lookups. If your query involves multiple tables, the server (not the storage engine) will have to do a join. Typically, in MySQL, this is a nested loop join. In an ideal world, this would all be really easy to spot when profiling the MySQL server, but in reality, everything has funny names like rnd_next.
As an aside, any memory allocated during query execution is likely done as part of a MEM_ROOT – essentially a pool allocator, likely optimized for some ancient libc on some ancient linux/Solaris and it just so happens to still kinda work okay. There’s some odd server configuration options for (some of the) MEM_ROOTs that get exactly no airtime on what they mean or what changing them will do.
InnoDB has its own data dictionary (separate to FRM files) which can also be limited in current MySQL (important when you have tens of thousands of tables) – which is separate to the MySQL Server table definitions and table definition cache.
But anyway, you have a number of shared data structures about tables and then a data structure for each open table. To actually read/write things to/from tables, you’re going to have to get some data to/from disk.
InnoDB tables can be stored either in one giant table space or file-per-table. (Even though it’s now configurable), InnoDB database pages are 16kb. Database pages are cached in the InnoDB Buffer Pool, and the buffer-pool-size should typically be about 80% of system memory. InnoDB will use a (configurable) method to flush. Typically, it will all be O_DIRECT (it’s configurable) – which is why “just use XFS” is step 1 in IO optimization – the per inode mutex in ext3/ext4 just doesn’t make IO scale.
InnoDB will do some of its IO in the thread that is performing the query and some of it in helper threads using native linux async IO (again, that’s configurable). With luck, all of the data you need to access is in the InnoDB buffer pool – where database pages are cached. There exists innodb_buffer_pool_instances configuration option which will split the buffer pool into several instances to help reduce lock contention on the InnoDB buffer pool mutex.
All InnoDB tables have a clustered index. This is the index by which the rows are physically sorted by. If you have an INT PRIMARY KEY on your  InnoDB table, then a row with that primary key value of 1 will be physically close to the row with primary key value 2 (and so on). Due to the intricacies of InnoDB page allocation, there may still be disk seeks involved in scanning a table in primary key order.
Every page in InnoDB has a checksum. There was an original algorithm, then there was a “fast” algorithm in some forks and now we’re converging on crc32, mainly because Intel implemented CPU instructions to make that fast. In write heavy workloads, this used to show up pretty heavily in profiles.
InnoDB has both REDO and UNDO logging to keep both crash consistency and provide consistent read views to transactions. These are also stored on disk, the redo logs being in their own files (size and number are configurable). The larger the redo logs, the longer it may take to run recovery after a crash. The smaller the redo logs, the more trouble you’re likely to run into with large or many concurrent transactions.
If your query performs writes to database tables, those changes are written to the REDO log and then, in the background, written back into the table space files. There exists configuration parameters for how much of the InnoDB buffer pool can be filled with dirty pages before they have to be flushed out to the table space files.
In order to maintain Isolation (I in ACID), InnoDB needs to assign a consistent read view to a new transaction. Transactions are either started explicitly (e.g. with BEGIN) or implicitly (e.g. just running a SELECT statement). There has been a lot of work recently in improving the scalability of creating read views inside InnoDB. A bit further in the past there was a lot of work in scaling InnoDB for greater than 1024 concurrent transactions (limitations in UNDO logging).
Fancy things that make InnoDB generally faster than you’d expect are the Adaptive Hash Index and change buffering. There are, of course, scalability challenges with these too. It’s good to understand the basics of them however and (of course), they are configurable.
If you end up reading or writing rows (quite likely) there will also be a translation between the InnoDB row format(s) and the MySQL Server row format(s). The details of which are not particularly interesting unless you’re delving deep into code or wish to buy me beer to hear about them.
Query execution may need to get many rows from many tables, join them together, sum things together or even sort things. If there’s an index with the sort order, it’s better to use that. MySQL may also need to do a filesort (sort rows, possibly using files on disk) or construct a temporary table in order to execute the query. Temporary tables are either using the MEMORY (formerly HEAP) storage engine or the MyISAM storage engine. Generally, you want to avoid having to use temporary tables – doing IO is never good.
Once you have the results of a query coming through, you may think that’s it. However, you may also be part of a replication hierarchy. If so, any changes made as part of that transaction will be written to the binary log. This is a log file maintained by the MySQL Server (not the storage engines) of all the changes to tables that have occured. This log can then be pulled by other MySQL servers and applied, making them replication slaves of the master MySQL Server.
We’ll ignore the differences between statement based replication and row based replication as they’re better discussed elsewhere. Being part of replication means you get a few extra locks and an additional file or two being written. The binary log (binlog for short) is a file on disk that is appended to until it reaches a certain size and is then rotated. Writes to this file vary in size (along with the size of transactions being executed). The writes to the binlog occur as part of committing the transaction (and the crash safety between writing to the binlog and writing to the storage engine are covered elsewhere – basically: you don’t want to know).
If your MySQL Server is a replication slave, then you have a thread reading the binary log files from another MySQL Server and then another thread (or, in newer versions, threads) applying the changes.
If the slow query log or general query log is enabled, they’ll also be written to at various points – and the current code for this is not optimal, there be (yes, you guess it) global mutexes.
Once the results of a query have been sent back to the client, the MySQL Server cleans things up (frees some memory) and prepares itself for the next query. You probably have many queries being executed simultaneously, and this is (naturally) a good thing.
There… I think that’s a mostly complete overview of all the things that can go on during query execution inside MySQL.
via Planet MySQL
Some current MySQL Architecture writings

iOS 8 Has Widgets! Here’s How to Use Them

iOS 8 Has Widgets! Here's How to Use Them

Yesterday, Apple finally pushed out iOS 8. Unlike last year’s refresh, iOS 8 is more about functional tweaks and additions than it is about looks. One of the most useful, and long-awaited features is finally adopting third-party widgets.

Before iOS 8, iPhone owners had a very restricted—read: Apple-only—widget ecosystem. Sure, you could jailbreak, but only apps like Calendar, Reminders, and Stock, pushed information to your iDevice’s Notification Center by default. Now third parties are welcome. So how do you get in on the fun?

Once you’ve slogged through the long iOS 8 update process, make sure all your apps are up to date and then pop into Notification Center. At first, everything will look as it did on iOS 7, but scroll down to the bottom and click on "Edit." As you download more apps, Apple will inform you of how many have widget support right below this button.

iOS 8 Has Widgets! Here's How to Use Them

When you click on "Edit," you’ll see that Apple sets by default all third-party widgets under a "Do Not Include" section, so you’ll need to go in a select which ones you want to use. This list shows all apps you currently own that support widgets.

Android faithful should note that these aren’t widgets in the sense you might be used to; in iOS 8, they’re simply present to give more power to the Notification Center. Once added to the list above, you can move the position of each widget by grabbing the small handle on the right.

iOS 8 Has Widgets! Here's How to Use Them

However, you can’t move apps above Today Summary or below Tomorrow Summary. Those are fixed in those positions. Of course, you could always just delete them.

That’s all there is to setup, but what apps are even offering widgets worth using? As you can expect, news pubs such as NYT Now, WSJ, VICE News and a few others have widgets that push breaking stories to Notification Center. A collection of calendars and task managers help you keep track of your day. eBay’s bidding widget makes it easy to one up that asshole who keeps trying to outbid you. pCalc packs in a simplified calculator into a widget body, perfect for the arithmetic deficient. OpenTable will track your reservations, and fitness apps like Pedometer++ and Runtime keep track of your step count while Lose It tracks your calories all from just a swipe away. Duolingo, the popular language learning app, also embeds its owl mascot to make you feel even worse about your foreign tongue ambitions. You can even keep track celestial mechanics and other astronomical goings on with Star Guide and Luminos.

Interestingly, Craig Federighi demoed an ESPN SportsCenter app showing updates your favorite teams, but the app doesn’t seem to be updated yet. We can expect an update soon, though, as well as similar functions from its competitors.

iOS 8 widgets are in their nascent phase. For one, they don’t support any kind of keyboard input, only existing for quick, glanceable information, access to certain apps, and tap-friendly interactions that are built into the widget, such as tapping a buy button or different keys on a digital calculator.

Also every widget needs a parent app. For instance, so if you really just like visual design of Yahoo’s weather widget, its parent app, which you may never even open, needs to live somewhere on your home screen. But now that Apple has embraced them, they’ll only grow in utility with iOS 9 and beyond. Any favorites that you’re already using that we missed? Let us know below.

via Gizmodo
iOS 8 Has Widgets! Here’s How to Use Them

The biggest comparison of sci-fi spaceships ever is complete at last

The biggest comparison of sci-fi spaceships ever is complete at last

At last, it’s done. The biggest spaceship size chart ever created is now complete and fully operational. 4,268 x 5,690 pixels of technological terror that includes everything from the smaller Star War ships to EVE. According to its author, Dirk Loechel, this is the last update. It’s epic.

The last update

For real this time: This is the final major content update, though if there are issues I’ll still fix them. I also haven’t forgotten I wanted to vectorize the writing. It’s still on the radar. But content-wise, I think that is about all I can put in.

Also, I added the ISS. For scale. It’s on top, with a yellow frame so it’s relatively easy to find.

Lots of errors fixed, lots of new ships too. Well, off for now, but I’ll be replying in the comments more or less regularly.

Also added a few more ships and switched the Eternal Crusader for the Legate Class battle barge, which is a class ship and not a unique ship.

Click on expand below and get lost in it, my friends. It’s awesome.

The biggest comparison of sci-fi spaceships ever is complete at last


SPLOID is a new blog about awesome stuff. Join us on Facebook

via Gizmodo
The biggest comparison of sci-fi spaceships ever is complete at last

Tracking Point: “The Real Deal”, Says WeaponsMan

xs3-right-side-silo-with-ammoWeaponsMan, a must read blog for those interested in military small arms, blogs about an early long-range consumer test of the TrackingPoint rifle and optic system: Quick take-aways: Best packaged gun any of them had ever seen. In the gunsmith’s experience, that’s out of thousands of new guns. Favorably impressed with the quality of the […]

Read More …

The post Tracking Point: “The Real Deal”, Says WeaponsMan appeared first on The Firearm Blog.


via The Firearm Blog
Tracking Point: “The Real Deal”, Says WeaponsMan

Here Are Clips From The First SiriusXM TechCrunch Radio Show

techcrunch-radio1 On Tuesday we launched our SiriusXM radio show, TechCrunch Radio, from Disrupt SF. Jordan Crook and John Biggs discussed the Apple Watch and did a startup pitch-off. Here are a few audio clips. Catch next week’s show on the Indie 103 channel, airing Tuesday September 16 at 6 p.m. ET. The show also replays Tuesdays at 9 p.m. ET, Fridays at 6 p.m. and Sundays at 9 p.m. ET. Read More


via TechCrunch
Here Are Clips From The First SiriusXM TechCrunch Radio Show

Star Wars last scene without the music is impossibly hilarious

Star Wars last scene without the music is impossibly hilarious

I always thought the final scene of the original Star Wars was one of the most ridiculous things ever* but, without the music soundtrack and with added ambient sound, it turns into one of the most cringeworthy scenes in the history of cinema. It’s just ridiculous.

*By the way, can anyone explain why the hell Chewbacca doesn’t get a medal too? He did the same freaking things as Han Solo!


SPLOID is a new blog about awesome stuff. Join us on Facebook

via Gizmodo
Star Wars last scene without the music is impossibly hilarious