“A really big deal”—Dolly is a free, open source, ChatGPT-style AI model

https://cdn.arstechnica.net/wp-content/uploads/2023/04/dolly_hero-760×380.jpg

The Databricks Dolly logo

Databricks

On Wednesday, Databricks released Dolly 2.0, reportedly the first open source, instruction-following large language model (LLM) for commercial use that’s been fine-tuned on a human-generated data set. It could serve as a compelling starting point for homebrew ChatGPT competitors.

Databricks is an American enterprise software company founded in 2013 by the creators of Apache Spark. They provide a web-based platform for working with Spark for big data and machine learning. By releasing Dolly, Databricks hopes to allow organizations to create and customize LLMs “without paying for API access or sharing data with third parties,” according to the Dolly launch blog post.

Dolly 2.0, its new 12-billion parameter model, is based on EleutherAI’s pythia model family and exclusively fine-tuned on training data (called “databricks-dolly-15k”) crowdsourced from Databricks employees. That calibration gives it abilities more in line with OpenAI’s ChatGPT, which is better at answering questions and engaging in dialogue as a chatbot than a raw LLM that has not been fine-tuned.

Dolly 1.0, released in March, faced limitations regarding commercial use due to the training data, which contained output from ChatGPT (thanks to Alpaca) and was subject to OpenAI’s terms of service. To address this issue, the team at Databricks sought to create a new data set that would allow commercial use.

To do so, Databricks crowdsourced 13,000 demonstrations of instruction-following behavior from more than 5,000 of its employees between March and April 2023. To incentivize participation, they set up a contest and outlined seven specific tasks for data generation, including open Q&A, closed Q&A, extracting and summarizing information from Wikipedia, brainstorming, classification, and creative writing.

The resulting numbers, along with Dolly’s model weights and training code, have been released fully open source under a Creative Commons license, enabling anyone to use, modify, or extend the data set for any purpose, including commercial applications.

In contrast, OpenAI’s ChatGPT is a proprietary model that requires users to pay for API access and adhere to specific terms of service, potentially limiting the flexibility and customization options for businesses and organizations. Meta’s LLaMA, a partially open source model (with restricted weights) that recently spawned a wave of derivatives after its weights leaked on BitTorrent, does not allow commercial use.

On Mastodon, AI researcher Simon Willison called Dolly 2.0 “a really big deal.” Willison often experiments with open source language models, including Dolly. “One of the most exciting things about Dolly 2.0 is the fine-tuning instruction set, which was hand-built by 5,000 Databricks employees and released under a CC license,” Willison wrote in a Mastodon toot.

If the enthusiastic reaction to Meta’s only partially open LLaMA model is any indication, Dolly 2.0 could potentially spark a new wave of open source language models that aren’t hampered by proprietary limitations or restrictions on commercial use. While the word is still out about Dolly’s actual performance ability, further refinements might allow running reasonably powerful LLMs on local consumer-class machines.

“Even if Dolly 2 isn’t good, I expect we’ll see a bunch of new projects using that training data soon,” Willison told Ars. “And some of those might produce something really useful.”

Currently, the Dolly weights are available at Hugging Face, and the databricks-dolly-15k data set can be found on GitHub.

Ars Technica – All content

Laravel Package Ocean

Discover new & useful Laravel packages. A place where you can find any Laravel package that you may need for your next project.Laravel News Links

Take This Unique Quiz About Duplicate Indexes In MySQL | pt-duplicate-key-checker

https://www.percona.com/blog/wp-content/uploads/2023/03/of_the_insides_of_a_dolphin_are_made_out_c3208efd-ea2b-44c3-98b2-797c9f5b1f5e-200×119.jpgDuplicate Indexes In MySQL

Indexes are crucial for optimizing query execution times in databases, but having an excessive number of indexes, or redundant ones, can negatively impact performance. While pt-duplicate-key-checker is the go-to tool for identifying duplicate or redundant indexes in MySQL, it may not catch all duplicates.

In this blog post, we’ll put ourselves to the test and see if we can identify duplicate and redundant indexes in MySQL. Toward the end, we will identify what the pt-duplicate-key-checker doesn’t.

The unique quiz

Consider the following MySQL table definition. Let’s put our brains to work and note any of the duplicate or redundant indexes (play fair, don’t cheat):

CREATE TABLE `table_with_lot_of_trouble` (
`id` int NOT NULL,
`col1` varchar(1) DEFAULT NULL,
`col2` varchar(2) DEFAULT NULL,
`col3` varchar(3) DEFAULT NULL,
`col4` varchar(4) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id` (`id`),
UNIQUE KEY `col1` (`col1`,`col2`),
UNIQUE KEY `col2` (`col2`,`col1`),
UNIQUE KEY `col1_2` (`col1`,`col2`),
UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
UNIQUE KEY `col1_4` (`col1`),
UNIQUE KEY `col1_5` (`col1`),
KEY `idx1` (`col1`,`id`),
KEY `idx2` (`col1`,`col2`),
KEY `idx3` (`col2`,`col1`),
KEY `idx4` (`col1`,`col2`,`col3`),
KEY `idx5` (`col1`,`col2`)
) ENGINE=InnoDB;

While you work on noting down the duplicate indexes in that MySQL table, let me also add some descriptions for duplicate and redundant indexes.

Duplicate index

Duplicate indexes occur when two or more indexes have the same set of columns in the same order. These can occur accidentally due to poor database design or through the use of database management tools that automatically create indexes without checking for duplicates.

Redundant index

Redundant indexes occur when two or more indexes have some overlapping columns. While these may not be exact duplicates, they can still negatively impact database performance.

Both duplicate and redundant indexes can waste disk space and slow down write operations. Each additional index requires additional disk space and inserts, so updates and deletes have to update multiple indexes. Additionally, such indexes can make it harder for the query optimizer to choose the most efficient index, as it has more options to consider.

Test results

Now, I believe you have your list of duplicate keys ready. Let us see what our favorite pt-duplicate-key-checker tells us about the indexes of the table, along with the reasons why they are considered duplicate or redundant.

[root@ip-172-31-82-182 ~]# pt-duplicate-key-checker --databases test --tables table_with_lot_of_trouble
# ########################################################################
# test.table_with_lot_of_trouble
# ########################################################################

# Uniqueness of id ignored because PRIMARY is a duplicate constraint
# id is a duplicate of PRIMARY
# Key definitions:
# UNIQUE KEY `id` (`id`),
# PRIMARY KEY (`id`),
# Column types:
# `id` int not null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `id`;

# Uniqueness of col1_4 ignored because col1_5 is a duplicate constraint
# col1_4 is a duplicate of col1_5
# Key definitions:
# UNIQUE KEY `col1_4` (`col1`),
# UNIQUE KEY `col1_5` (`col1`),
# Column types:
# `col1` varchar(1) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1_4`;

# idx3 is a duplicate of col2
# Key definitions:
# KEY `idx3` (`col2`,`col1`),
# UNIQUE KEY `col2` (`col2`,`col1`),
# Column types:
# `col2` varchar(2) default null
# `col1` varchar(1) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx3`;

# idx4 is a duplicate of col1_3
# Key definitions:
# KEY `idx4` (`col1`,`col2`,`col3`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx4`;

# Uniqueness of col1 ignored because col1_5 is a stronger constraint
# col1 is a left-prefix of col1_3
# Key definitions:
# UNIQUE KEY `col1` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1`;

# Uniqueness of col1_2 ignored because col1_5 is a stronger constraint
# col1_2 is a left-prefix of col1_3
# Key definitions:
# UNIQUE KEY `col1_2` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `col1_2`;

# idx2 is a left-prefix of col1_3
# Key definitions:
# KEY `idx2` (`col1`,`col2`),
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx2`;

# idx5 is a left-prefix of col1_3
# Key definitions:
# KEY `idx5` (`col1`,`col2`)
# UNIQUE KEY `col1_3` (`col1`,`col2`,`col3`),
# Column types:
# `col1` varchar(1) default null
# `col2` varchar(2) default null
# `col3` varchar(3) default null
# To remove this duplicate index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx5`;

# Key idx1 ends with a prefix of the clustered index
# Key definitions:
# KEY `idx1` (`col1`,`id`),
# PRIMARY KEY (`id`),
# Column types:
# `col1` varchar(1) default null
# `id` int not null
# To shorten this duplicate clustered index, execute:
ALTER TABLE `test`.`table_with_lot_of_trouble` DROP INDEX `idx1`, ADD INDEX `idx1` (`col1`);

# ########################################################################
# Summary of indexes
# ########################################################################

# Size Duplicate Indexes 145
# Total Duplicate Indexes 9
# Total Indexes 13

The pt-duplicate-key-checker notes nine duplicate indexes. Could you identify all nine of them? If so, surely you’ve good command over the database schema design. But I wouldn’t write a blog to test your compatibility with pt-duplicate-key-checker.

There is one more duplicate key that pt-duplicate-key-checker is missing; could you identify it? If so, I encourage you to apply at Percona and give me an opportunity to work with smarter brains.

The duplicate unique keys

For those who couldn’t identify the duplicate index, the unidentified duplicate keys are… (drum roll)…

UNIQUE KEY (col1, col2)
UNIQUE KEY (col2, col1)

It follows logically that if a tuple {a, b} is unique, then {b, a} will also be unique. Similar to how Peter Parker is to Spiderman and Gangadhar is to Shaktiman, the set {a, b} is equivalent to the set {b, a}.  This causes the unique key to double-enforce the uniqueness check.

Therefore, having an additional duplicate constraint defined on the same set of columns becomes unnecessary regardless of order. This is specifically true for two-column unique keys only. To optimize your database, you should consider dropping the second unique key or converting it to a secondary index if it is required.

Since you cannot go on and read all table definitions, I wrote a query for you to identify duplicate unique indexes:

mysql> SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, group_concat(INDEX_NAME) duplic8_UK, COLUMN_NAMES FROM 
 (SELECT DISTINCT TABLE_SCHEMA, TABLE_NAME, INDEX_NAME, GROUP_CONCAT(COLUMN_NAME ORDER BY COLUMN_NAME SEPARATOR ',') AS COLUMN_NAMES 
 FROM information_schema.STATISTICS WHERE NON_UNIQUE = 0 AND INDEX_NAME!='PRIMARY' AND INDEX_TYPE = 'BTREE'  
 GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME) X group by TABLE_SCHEMA, TABLE_NAME, COLUMN_NAMES having count(*)> 1;

+--------------+---------------------------------------------+---------------+--------------+
| TABLE_SCHEMA | TABLE_NAME | duplic8_UK | COLUMN_NAMES |
+--------------+---------------------------------------------+---------------+--------------+
| test | table_with_lot_of_trouble | col1_4,col1_5 | col1 |
| test | table_with_lot_of_trouble | col1,col2 | col1,col2 |
+--------------+---------------------------------------------+---------------+--------------+

Also, don’t forget to provide your opinion in the comments section: Should the non-identification issue with pt-duplicate-key-checker be considered a bug report or a feature request?

Conclusion

Percona’s pt-duplicate-key-checker is an amazing tool, but like every other tool, it is not “fool-proof.” While you create your indexes, evaluate them for duplicity.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Planet MySQL

Over Half a Million of These Hondas Are Being Recalled

https://i.kinja-img.com/gawker-media/image/upload/c_fill,f_auto,fl_progressive,g_center,h_675,pg_1,q_80,w_1200/322b2d7594a06eec134e3e47f5fb66a7.jpg

Over half a million Honda vehicles have been recalled after multiple reports of a rear part detaching due to corrosion, according to the National Highway Traffic Safety Administration. The recall mentions that salt-belt states, which use de-icing salt to reduce snow from the roads during winter, are having problems when vehicles are driven through puddles at high speeds and salt enters the rear frame. Over time, the salt can cause corrosion, leading to the rear trailing arm falling off.

According to the recall notice, Honda has received 61 complaints of road salt accumulating and causing frame corrosion, potentially causing drivers to lose control and increase the chances of accidents.

Which Hondas are being recalled?

The recalled SUVs are 2007-2011 CR-Vs that were sold or registered in Connecticut, Delaware, Illinois, Indiana, Iowa, Kentucky, Maine, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Hampshire, New Jersey, New York, Ohio, Pennsylvania, Rhode Island, Vermont, Virginia, West Virginia, Wisconsin, and Washington D.C. According to the recall, there have been no reports of issues to vehicles sold outside the salt-belt region.

What to do if your Honda was recalled

According to Honda, you should have your car dealer inspect and install a support brace or repair the affected rear frame if needed, at no cost. For some people, Honda will offer to buy the vehicle if the damage is too serious or the damaged part can’t be removed. Owners should expect to be notified by mail starting on May 8. If you have any questions about the recall, call Honda customer service at 1-888-234-2138. Honda’s reference number for this recall is XDZ.

Lifehacker

The Most Important MySQL Setting

https://www.percona.com/blog/wp-content/uploads/2023/03/lucas.speyer_an_underwater_high_tech_computer_server_a_dolpin_i_9337e5c5-e3c5-41dd-b0b1-e6504186488b-150×150.pngmysql tuning

If we were to select the most important MySQL setting, if we were given a freshly installed MySQL or Percona Server for MySQL and could only tune a single MySQL variable, which one would it be?


It has always bothered me that “out-of-the-box” MySQL performance is subpar: if you install MySQL or Percona Server for MySQL in a new server and do not “tune it” (as in change default values for configuration settings), it just won’t be able to make the best use of the server’s available resources – particularly memory.

To illustrate this, I ran the Sysbench-TPCC synthetic benchmark against two different GCP instances running a freshly installed Percona Server for MySQL version 8.0.31 on CentOS 7, both of them spec’d with four vCPUs but with the second one (server B) having a tad over twice as much memory than the reference one (server A).

Sysbench ran on a third server, which I’ll refer to as the application server (APP). I’ve used a fourth instance to host a PMM server to monitor servers A and B and used the data collected by the PMM agents installed on the database servers to compare performance. The table below summarizes the GCP instances used for these tests:

Server identifier Machine type vCPU Memory (GB)
A n1-standard-4 4 15
B n2-highmem-4 4 32
APP n1-standard-8 8 64
PMM e2-medium 2 4

 

Sysbench-TPCC has been executed with the following main options:

  • ‐‐threads=256
  • ‐‐tables=10
  • ‐‐scale=100
  • ‐‐time=3600

It generated a dataset with the following characteristics:

mysql> SELECT
-> ROUND(SUM(data_length+index_length)/1024/1024/1024, 2) as Total_Size_GB,
-> ROUND(SUM(data_length)/1024/1024/1024,2) as Data_Size_GB,
-> ROUND(SUM(index_length)/1024/1024/1024,2) as Index_Size_GB
-> FROM information_schema.tables
-> WHERE table_schema='sbtest';
+---------------+--------------+---------------+
| Total_Size_GB | Data_Size_GB | Index_Size_GB |
+---------------+--------------+---------------+
|     92.83     |     77.56    |     15.26     |
+---------------+--------------+---------------+

 

One of the metrics measured by Sysbench is the number of queries per second (QPS), which is nicely represented by the MySQL Questions (roughly, “the number of statements executed by the server”) graph in PMM (given these servers are not processing anything other than the Sysbench benchmark and PMM monitoring queries):

Server A (4 vCPU, 15G RAM) Server B (4 vCPU, 32G RAM)

Server A produced an average of 964 QPS for the one-hour period the test was run, while Server B produced an average of 1520 QPS. The throughput didn’t double but increased by 57%. Are these results good enough?

I’ll risk “adding insult to injury” and do the unthinkable of comparing apples to oranges. Here’s how the same test performed when running Percona Distribution for PostgreSQL 14 on these same servers:

Queries: reads Queries: writes Queries: other Queries: total Transactions Latency (95th)
MySQL (A) 1584986 1645000 245322 3475308 122277 20137.61
MySQL (B) 2517529 2610323 389048 5516900 194140 11523.48
PostgreSQL (A) 2194763 2275999 344528 4815290 169235 14302.94
PostgreSQL (B) 2826024 2929591 442158 6197773 216966 9799.46

 

QPS (avg)
MySQL (A) 965 100%
MySQL (B) 1532 159%
PostgreSQL (A) 1338 139%
PostgreSQL (B) 1722 178%

sysbench tpcc

For a user that does not understand how important it is to tune a database server or doesn’t know how to do it and just experiments with these two RDMS offerings, PostgreSQL seems to have the edge when it comes to out-of-the-box performance. Why is that?

MySQL comes pre-configured to be conservative instead of making the most of the resources available in the server. That’s a heritage of the LAMP model when the same server would host both the database and the web server.

To be fair, that is also true with PostgreSQL; it hasn’t been tuned either, and it, too, can also perform much better. But, by default, PostgreSQL “squeezes” the juice out of the server harder than MySQL does, as the following table with server resource usage indicates:

CPU Memory IO
M(A)
M(B)
P(A)
P(B)

Data caching

To ensure durability, the fourth and last property of ACID-compliant databases such as MySQL and PostgreSQL, data must be persisted to “disk” so it remains available once the server is restarted. But since retrieving data from disk is slow, databases tend to work with a caching mechanism to keep as much hot data, the bits and pieces that are most often accessed, in memory.

In MySQL, considering the standard storage engine, InnoDB, the data cache is called Buffer Pool. In PostgreSQL, it is called shared buffers. A curious similarity is that both the Buffer Pool and the shared buffers are configured with 128M by default.

On the other hand, one of the big differences in their implementation stands from the fact MySQL (InnoDB) can load data (pages) from disk straight into the Buffer Pool’s memory area. PostgreSQL’s architecture uses a different approach: as is the case for the majority of the applications, it relies on the file system (FS) cache to load a page from disk to memory and then makes a copy of that page in the shared buffer’s memory area.

I have no intention of discussing the pros and cons of each of these RDBMS’ caching implementations; the only reason I’m explaining this is to highlight how, in practice, they are configured in opposite ways: when we tune MySQL, we tend to allocate most of the memory to the Buffer Pool (let’s simplify and say 80% of it), whereas, on PostgreSQL, we tend to do the inverse and allocate just a small portion of it (say, 20%). The reasoning here is that since PostgreSQL relies on the FS cache, it pays off to allow free memory to be naturally used for FS cache as it ends up working as a sort of 2nd-level caching for PostgreSQL: there’s a good chance that a page that has been evicted from the shared buffers can still be found in the FS cache – and copying a page from one memory area to another is super fast. This explains, in part, how PostgreSQL performed better out of the box for this test workload.

Now that I got your attention, I’ll return the focus to the main subject of this post. I’ll make sure to do a follow-up one for PostgreSQL.

Just increase the Buffer Pool size

I wrote above that “we tend to allocate most of the memory to the Buffer Pool (let’s simplify and say 80% of it)”. I didn’t make up that number; it’s in the MySQL manual. It’s also probably the most well-known MySQL rule-of-thumb. If you want to learn more about it, Jay Janssen wrote a nice blog post (innodb_buffer_pool_size – Is 80% of RAM the right amount?) dissecting it a few years ago. He started that post with the following sentence:

It seems these days if anyone knows anything about tuning InnoDB, it’s that you MUST tune your innodb_buffer_pool_size to 80% of your physical memory.

There you have it: if one could only tune a single MySQL variable, that must be innodb_buffer_pool_size. In fact, I once worked with a customer that had added a slider button to their product’s GUI to set the size of the Buffer Pool on the adjacent MySQL server and nothing else.

Realistically, this has been the number one parameter to tune on MySQL because increasing the data cache size makes a big difference for most workloads, including the one I’ve used for my tests here.

But the 80% rule just doesn’t fit all cases. On Server A, 80% of 14.52G was roughly 12G, and allocating that much memory to the Buffer Pool proved to be too much, with Linux’s Out-Of-Memory (OOM) monitor killing the mysqld process:

[Fri Mar 10 16:24:49 2023] Killed process 950 (mysqld), UID 27, total-vm:16970700kB, anon-rss:14226528kB, file-rss:0kB, shmem-rss:0kB

That’s the blacked-out mark in the graph of the table below. I had to settle for a Buffer Pool size of 10G (69% of memory), which left about 4.5G for the OS as well as other memory-consuming parts of MySQL (such as connections and temporary tables). That’s a good reminder that we don’t simply tune MySQL for the server it is running on; we need to take the workload being executed into (high) consideration too.

For Server B, I’ve tried to go with a Buffer Pool size of 27G (84% of memory), but that also proved too much. I settled with 81%, which was good enough for the task at hand. The results are summarized in the table below.

Buffer Pool size                               Default                                                  |                                     Tuned
MySQL (A) MySQL questions
MySQL (B)

As we can see above, throwing more memory (as in increasing the data cache size) just does not cut it beyond a certain point. For example, if the hot data can fit in 12G, then increasing the Buffer Pool to 26G won’t make much of a difference. Or, if we are hitting a limit in writes, we need to look at other areas of MySQL to tune.

Dedicated server

MySQL finally realized almost no one else keeps a LAMP stack running in a single server. We have long been surfing the virtualization wave (to keep it broad). Most production environments have MySQL running on their own dedicated server/VM/container, so it makes no sense to limit the Buffer Pool to only 128M by default anymore.

MySQL 8.0 introduced the variable innodb_dedicated_server, which configures not only the Buffer Pool size (innodb_buffer_pool_size) according to the server’s available memory but also the redo log space (now configured through innodb_redo_log_capacity), which is InnoDB’s transaction log and plays an important role in data durability and in the checkpointing process, which in turn influences… write throughput. Oh, and the InnoDB flush method (innodb_flush_method) as well.

This option for Enabling Automatic Configuration for a Dedicated MySQL Server is a bit more sophisticated than a rule of thumb and employs a simple algorithm to define the value for the Buffer Pool size and the redo log space, and configured my test servers as follows:

Server A Server B
innodb_buffer_pool_size 11G 24G
innodb_redo_log_capacity 9G 18G
innodb_flush_method O_DIRECT_NO_FSYNC O_DIRECT_NO_FSYNC

 

The default values for innodb_redo_log_capacity and innodb_flush_method being used so far were, respectively, 100M and FSYNC. Without further ado, here are the results for the three test rounds for each server side-by-side for easier comparison:

Server A:

MySQL server tuning
MySQL memory usage

 

Server B:

MySQL Disk and swap activity

Note how CPU usage is now close to maximum usage (despite a lot of the time being spent in iowait due to the slow disk, particularly for the smaller server).

With the dedicated server (third “peak” in the graphs below) using very similar Buffer Pool values to my Buffer Pool-tuned test (second “peak”), the much larger redo log space coupled with the O_DIRECT flush method (with “no fsync”) allowed for much-improved write performance:

Server A InnoDB row reads
Server B

It’s probably time to change the default configuration and consider every new MySQL server a dedicated one.


NOTE – I hit a “limitation” from my very first Sysbench run:

Running the test with following options:
Number of threads: 256
(...)
FATAL: error 1040: Too many connections

Due to MySQL’s cap on the number of concurrent connections, it allows, by default, 150. I could have run my tests with 128 Sysbench threads, but that would not have driven as much load into the database as I wanted, so I raised max_connections to 300.

Technically, this means I cheated since I have modified two MySQL settings instead of one. In my defense, max_connections doesn’t influence the performance of MySQL; it just controls how many clients can connect at the same time, with the intent of limiting database activity somewhat. And if your application attempts to surpass that limit, you get a blatant error message like the one above.

BTW, I also had to increase the exact same setting (max_connections) on PostgreSQL to run my initial provocative test.


The goal of this post was to encourage you to tune MySQL, even if just one setting. But you shouldn’t stop there. If you need to get the most out of your database server, consider using Percona Monitoring and Management (PMM) to observe its performance and find ways to improve it.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

 

Download Percona Monitoring and Management Today

Percona Database Performance Blog

Laravel: 9 Typical Mistakes Juniors Make

https://laraveldaily.com/storage/377/Add-a-heading-(5).png

Some time ago I made a YouTube series called Code Reviews. From that series and other reviews, I’ve collected the 9 most common repeating mistakes Laravel beginners make.

Not all of those are really serious flaws, most of them are just not the most effective ways to code. Then it’s an open question why do you use a framework like Laravel and don’t actually use its core features in full?

So, in no particular order…


Mistake 1. Not Using Route Groups

Where possible combine routes into groups. For example, you have routes like this:

Route::get('dashboard', [HomeController::class, 'index'])->name('dashboard')->middleware(['auth']);

Route::resource('donation', DonationController::class)->middleware(['auth']);

Route::resource('requisition', RequisitionController::class)->middleware(['auth']);

 

Route::name('admin.')->prefix('admin.')->group(function () {

Route::view('/', 'admin.welcome')->middleware(['auth', 'admincheck']);

Route::resource('donor', DonorController::class)->middleware(['auth', 'admincheck']);

Route::resource('details', OrganisationDetailController::class)->middleware(['auth', 'admincheck']);

});

Here, we have all routes that have middleware auth and three routes that also check if the user is admin. But those middlewares are repeating each time.

It would be better to put all routes into the group to check middleware for auth and then inside have another group for admin routes.

This way when the developer opens the routes file, he will immediately know which routes are only for authenticated users.

Route::middleware('auth')->group(function () {

Route::get('dashboard', [HomeController::class, 'index'])->name('dashboard');

Route::resource('donation', DonationController::class);

Route::resource('requisition', RequisitionController::class);

 

Route::name('admin.')->prefix('admin.')->middleware('admincheck')->group(function () {

Route::view('/', 'admin.welcome');

Route::resource('donor', DonorController::class);

Route::resource('details', OrganisationDetailController::class);

});

});

Read more


Mistake 2. Not Using Route Model Binding

Often I see beginner coders in the controller manually search for the data, even if in routes Model Binding is specified correctly. For example, in your routes you have:

Route::resource('student', StudentController::class);

So here it’s even a resource route. But I see some beginners still write Controller code like this:

public function show($id)

{

$student = Student::findOrFail($id);

 

return view('dashboard/student/show', compact(['student']));

}

Instead, use Route Model Binding and Laravel will find Model:

public function show(Student $student)

{

return view('dashboard/student/show', compact(['student']));

}

Read more


Mistake 3. Too Long Eloquent Create/Update Code

When saving data into DB I have seen people write code similar to this:

public function update(Request $request)

{

$request->validate(['name' => 'required']);

$user = Auth::user();

$user->name = $request->name;

$user->username = $request->username;

$user->mobile = $request->mobile;

// Some other fields...

$user->save();

return redirect()->route('profile.index');

}

Instead, it can be written shorter, in at least two ways.

First, in this example, you don’t need to set the Auth::user() to the $user variable. The first option could be:

public function update(Request $request)

{

$request->validate(['name' => 'required']);

auth()->user()->update([$request->only([

'name',

'username',

'mobile',

// Some other fields...

]);

return redirect()->route('profile.index');

}

The second option put validation into Form Request. Then into the update() method you would need just to pass $request->validated().

public function update(ProfileRequest $request)

{

auth()->user()->update([$request->validated());

return redirect()->route('profile.index');

}

See how shorter the code is?

If you want to dive deeper into Eloquent, I have a full course Eloquent: The Expert Level


Mistake 4. Not Naming Things Properly

Many times beginners name things however they want, not thinking about other developers who would read their code in the future.

For example, shortening variable names: instead of $data they call $d. Always use proper naming. For example:

Route::get('/', [IndexController::class, 'show'])

->middleware(['dq'])

->name('index');

Route::get('/about', [IndexController::class, 'about'])

->middleware(['dq'])

->name('about');

Route::get('/dq', [IndexController::class, 'dq'])

->middleware(['auth'])

->name('dq');

What is this middleware and method in Index Controller called dq? Well, in this example, if we would go into app/Http/Kernel.php to find this middleware, we could find something like this:

class Kernel extends HttpKernel

{

// ...

protected $routeMiddleware = [

'auth' => \App\Http\Middleware\Authenticate::class,

'admin' => \App\Http\Middleware\EnsureAdmin::class,

'dq' => \App\Http\Middleware\Disqualified::class,

'inprogress' => \App\Http\Middleware\InProgress::class,

// ...

];

}

It doesn’t matter what’s inside this middleware, but from the name of the middleware file it is disqualified. So everywhere instead of dq it should be called disqualified. This way if other developers would join the project, they would have a better understanding.


Mistake 5. Too Big Controllers

Quite often I see juniors writing huge Controllers with all possible actions in one method:

  • Validation
  • Checking data
  • Transforming data
  • Saving data
  • Saving more data in other tables
  • Sending emails/notifications
  • …and more

That could all be in one store() method, for example:

public function store(Request $request)

{

$this->authorize('user_create');

 

$userData = $request->validate([

'name' => 'required',

'email' => 'required|unique:users',

'password' => 'required',

]);

 

$userData['start_at'] = Carbon::createFromFormat('m/d/Y', $request->start_at)->format('Y-m-d');

$userData['password'] = bcrypt($request->password);

 

$user = User::create($userData);

$user->roles()->sync($request->input('roles', []));

 

Project::create([

'user_id' => $user->id,

'name' => 'Demo project 1',

]);

Category::create([

'user_id' => $user->id,

'name' => 'Demo category 1',

]);

Category::create([

'user_id' => $user->id,

'name' => 'Demo category 2',

]);

 

MonthlyRepost::where('month', now()->format('Y-m'))->increment('users_count');

$user->sendEmailVerificationNotification();

 

$admins = User::where('is_admin', 1)->get();

Notification::send($admins, new AdminNewUserNotification($user));

 

return response()->json([

'result' => 'success',

'data' => $user,

], 200);

}

It’s not necessarily wrong but it becomes very hard to quickly read for other developers in the future. And what’s hard to read, becomes hard to change and fix future bugs.

Instead, Controllers should be shorter and just take the data from routes, call some methods and return the result. All the logic for manipulating data should be in the classes specifically suitable for that:

  • Validation in Form Request classes
  • Transforming data in Models and/or Observers
  • Sending emails in events/listeners put into the queue
  • etc.

I have an example of such transformation of a typical Controller method in this article: Laravel Structure: Move Code From Controller to… Where?

There are various approaches how to structure the code, but what should be avoided is one huge method responsible for everything.


Mistake 6. N+1 Eloquent Query Problem

By far the no.1 typical reason for poor performance of Laravel project is the structure of Eloquent queries. Specifically, N+1 Query problem is the most common: running hundreds of SQL queries on one page definitely takes a lot of server resources.

And it’s relatively easy to spot this problem in simple examples like this:

// Controller not eager loading Users:

$projects = Project::all();

 

// Blade:

@foreach ($projects as $project)

<li> ()</li>

@endforeach

But real life examples get more complicated and the amount of queries may be “hidden” in the accessors, package queries and other unpredictable places.

Also, typical junior developers don’t spend enough time testing their application with a lot of data. It works for them with a few database records, so they don’t go extra mile to simulate future scenarios, where their code would actually cause performance issues.

A few of my best resources about it:


Mistake 7. Breaking MVC Pattern: Logic in Blade

Whenever I see a @php directive in a Blade file, my heart starts beating faster.

See this example:

@php

$x = 5

@endphp

Except for very (VERY) rare scenarios, all the PHP code for getting the data should be executed before coming to show it in Blade.

MVC architecture was created for a reason: that separation of concerns between Model, View and Controller makes it much more predictable where to search for certain code pieces.

And while the M and C parts can be debated whether to store the logic in Model, in Controller, or in separate Service/Action classes, the V layer of Views is kinda sacred. The golden rule: views should not contain logic. In other words, views are only for presenting the data, not for transforming or calculating it.

The origin of this comes to the fact that Views could be taken by a front-end HTML/CSS developer and they could make the necessary changes to styling, without needing to understand any PHP code. Of course, in real life that separation rarely happens in teams, but it’s a noble goal from a pattern that comes from outside of Laravel or even PHP.

For most mistakes in this list I have a “read more” list of links, but here I have nothing much to add. Just don’t store logic in Views, that’s it.


Mistake 8. Relationships: Not Creating Foreign Keys

Relationships between tables are created on two levels: you need to create the related field and then a foreign key. I see many juniors forget the second part.

Have you ever seen something like this in migration?

$table->unsignedBigInteger('user_id');

On the surface, it looks ok. And it actually does the job, with no bugs. At first.

Let me show you what happens if you don’t put constrained() or references() in that migration file.

Foreign key is the mechanism for restricting related operations on the database level: so when you delete the parent record, you may choose what happens with children: delete them too or restrict the parent deletion in the first place.

So, if you create just a unsignedBigInteger() field, without the foreign key, you’re allowing your users to delete the parent without any consequences. So the children stay in the database, with their parent non-existing anymore.

You can also watch my video about it: Laravel Foreign Keys: How to Deal with Errors


Mistake 9. Not Reading The Documentation

I want to end this list with an elephant in the room. For many years, the most popular of my articles and tweets are with information literally taken from the documentation, almost word for word.

Over the years I’ve realized that people don’t actually read the documentation in full, only the main parts of the ones that are the most relevant to them.

So many times, developers surprised me by not knowing the obvious features that were in the docs.

Junior developers earn mostly from ready-made tutorials or courses, which is fine, but reading the official docs should be a regular activity.

Laravel News Links

PlanetScale Database Migrations for Laravel

https://laravelnews.s3.amazonaws.com/images/laravel-database-copy-featured.png

This community PlanetScale package for Laravel adds an artisan pscale:migrate command to your Laravel applications. This command helps you manage database migrations using the PlanetScale API, a process which varies slightly from using the built-in migrate command.

During a deployment, you’d run the following command instead of migrate which does everything necessary to update your database’s schema:

php artisan pscale:migrate

Why is this needed?

You might wonder why this command is needed instead of directly using the migrate command.

According to the package’s readme, PlantScale handles migrations in a different way that you’d typically see with databases:

PlanetScale has alot of advantages when using it as your application’s production database. However it handles your database and schema migrations in a somewhat unusual way.

It uses branches for your database. A branch can be production or development…

This package uses PlanetScale’s Public API to automate the process of creating a new development branch, connecting your app to the development branch, running your Laravel migrations on the development branch, merging that back into your production branch, and deleting the development branch.

To get started with this package, check out the package setup instructions on GitHub at x7media/laravel-planetscale.

Related:
Speaking of PlanetScale and databases, Aaron Francis published MySQL for Developers. We’d highly recommend you check that out to improve your database skills.

Laravel News

A Collection of Fun Databases For Programming Exploration

Longtime Slashdot reader Esther Schindler writes: When you learn a new tool/technology, you need to create a sample application, which cannot use real in-house data. Why not use something fun for the sample application’s data, such as a Star Wars API or a data collection about World Cup contests? Esther Schindler, Slashdot user #16185, assembled a groovy collection of datasets that may be useful but also may be a source of fascinating internet rabbit holes. For those interested in datasets, Esther also recommends the Data is Plural newsletter and the website ResearchBuzz, which shares dataset descriptions as well as archive-related news and tools.
"Google Research maintains a search site for test datasets, too, if you know what you’re looking for," adds Esther. There’s also, of course, Kaggle.com.


Read more of this story at Slashdot.

Slashdot

Everything You Can Test in Your Laravel Application

https://laravelnews.s3.amazonaws.com/images/everything-you-can-test-in-laravel-app.png

Christoph Rumpel has an excellent guide, Everything You Can Test in Your Laravel Application, of scenarios you’ll likely need to test on real applications.

The post Everything You Can Test in Your Laravel Application appeared first on Laravel News.


Join the Laravel Newsletter to get Laravel articles like this directly in your inbox.

Laravel News