How to update large data in Laravel using Commands, Chunking, and Database Transactions

https://42coders.com/storage/88/big_data_sets.jpg

Laravel

Max Hutschenreiter –

Sometimes you need to update the data in your Database.  The easiest possibility is to just run an update in your MySQL Database. This is not always working. Especially when you use events or you also want to update relations ….

Commands

In this case, I recommend creating a Command. Even for just one-time changes.

php artisan make:command YourCommandName

Progress Bar

The first tip would be to use a progress bar. In long-running commands, it’s helpful to see that there is progress.

To show you how I just copy the example from the

Laravel Documentation

.

$users = App\Models\User::all();

$bar = $this->output->createProgressBar(count($users));

$bar->start();

foreach ($users as $user) {
    $this->performTask($user);

    $bar->advance();
}

$bar->finish();

Chunking

This works fine until a couple of hundred entries with easy changes. If you want to change more entries with more complexity you should use chunking results.
The problem is if you load everything in your eloquent collection your ram will be a limitation. To avoid it you can use the build-in laravel function chunk on your queries to iterate through the table in sequences.

App\Models\User::chunk(200, function ($users){
    foreach($users as $user){
        $user->name .= ' :)';
        $user->save();
    }
});

One important thing to understand about the chunk function is to understand how the queries run. In this example after the 200 users got iterated through, the base query is being executed with the LMIT function on the table again. 
Imagine you have this case

App\Models\User::where('active', true)
    ->chunk(200, function ($users){
        foreach($users as $user){
            $user->active = false;
            $user->save();
        }
    });

In this code, it would go over the 200 users changing the active value to false.  In the second run, it would ask the Database again for the users which have active true. The problem is since we just changed the active status of 200 users we would get the list without them. But the Limit function would limit the result to start from 200 to 400 in the results. That means we would skip 200 users which we actually wanted to change.
Laravel has a function to overcome the problem it’s just important to understand when to use it. So the solution in this situation would be.

App\Models\User::where('active', true)
    ->chunkById(200, function ($users){
        foreach($users as $user){
            $user->active = false;
            $user->save();
        }
    });

Database Transactions

Now we are able to execute a lot of changes to our Models and we avoid the problem that our Eloquent collections becoming too big. 
But in our last example, we would execute an updated Statement for every single user in our DB. To avoid this I found it a good tactic to use Transactions. 
This allows us to reuse our chunks and update the DB per chunk.

App\Models\User::where('active', true)
    ->chunkById(200, function ($users){
        try {
            DB::beginTransaction();
            
            foreach($users as $user){
                $user->active = false;
                $user->save();
            }
           
            DB::commit();

        } catch (\Exception $e) {
            //handle your error (log ...)
            DB::rollBack();
        }
    });

In this code example, we combine the chunkById with Database Transactions. This can save a lot of time in updating the DB. You can read more about the

Database Transactions in the Laravel Documentation

.

Transactions can cause trouble if not used correctly. If you forget to commit or rollBack you will create nested transactions. You can read more in the Blogpost 

Combine it together

To finalize this code example we can bring in again the progress bar.

$count = App\Models\User::where('active', true)->count();

$bar = $this->output->createProgressBar($count);
$bar->start();

App\Models\User::where('active', true)
    ->chunkById(200, function ($users){
        try {
            DB::beginTransaction();
            
            foreach($users as $user){
                $user->active = false;
                $user->save();
                $bar->advance();
            }
           
            DB::commit();

        } catch (\Exception $e) {
            //handle your error (log ...)
            DB::rollBack();
            $bar->finish();
        }
    });

$bar->finish();

So this is my strategy to handle updates on bigger data sets. You can change the Chunk size by your needs and experiments that get you good results. In my experience something from 200 – 1000 is ok.
Sometimes especially when the calculation for the single entry is more complicated I see the whole process getting slower after each processing. It starts with around 2sec per bar advance up to 30 or 40 seconds. Since I experienced it across different commands I am not sure if it’s a general topic. If anyone has any info on it let me know.

Hope this article helps you.

Laravel News Links