Laravel – Eager loading can be bad!
January 28, 2024
Hello ????
Yes, you read it right. Eager loading can be bad, really bad. However, we often resort to it when dealing with an N+1 scenario, thinking that we’ve resolved the issue, when in fact, we might have made it worse. How? Let’s see.
How bad it gets
For this demo, we are building Laravel Forge. Like (almost) every Laravel application, we will have a One To Many relationship.
We aim to log every activity for a server. A log can include the activity type, the user who initiated it, and other useful information for later analysis.
<?php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Relations\HasMany;
class Server extends Model
{
public function logs(): HasMany
{
return $this->hasMany(Log::class);
}
}
Now, in the application, we want to list all the servers. So, we might do something like
<table>
<tr>
<th>Name</th>
</tr>
@foreach ($servers as $server)
<tr>
<td></td>
</tr>
@endforeach
</table>
Moving forward, we have 10 servers, and each of them has 1000 logs.
So far, so good. Now, we want to display when the last activity on a server occurred
<table>
<tr>
<th>Name</th>
<th>Last Activity</th>
</tr>
@foreach ($servers as $server)
<tr>
<td></td>
<td>
</td>
</tr>
@endforeach
</table>
Basic things, we access the logs()
relation, ordering it to retrieve the latest record, getting the created_at
column, and formatting it for better readability using diffForHumans()
. The latter yields something like "1 week ago".
But this is bad, we’ve introduced an N+1 problem.
If you don’t know what a N+1 is, we are running the following queries
select * from `servers`
select * from `logs` where `logs`.`server_id` = 1 and `logs`.`server_id` is not null order by `created_at` desc limit 1
select * from `logs` where `logs`.`server_id` = 2 and `logs`.`server_id` is not null order by `created_at` desc limit 1
select * from `logs` where `logs`.`server_id` = 10 and `logs`.`server_id` is not null order by `created_at` desc limit 1
To resolve this issue, we typically reach out to Eager Loading (I know you did).
$servers = Server::query()
->with('logs')
->get();
<table>
<tr>
<th>Name</th>
<th>Last Activity</th>
</tr>
@foreach ($servers as $server)
<tr>
<td>{{ $server->name }}</td>
<td>
{{ $server->logs->sortByDesc('created_at')->first()->created_at->diffForHumans() }}
</td>
</tr>
@endforeach
</table>
With this update, we manage to reduce it to only 2 queries
select * from `servers`
select * from `logs` where `logs`.`server_id` in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
And it looks like we addressed the problem, right?
Wrong! We’re only considering the number of queries. Let’s examine the memory usage and the count of loaded models; these factors are equally important.
- Before eager loading
- 11 queries: 1 to retrieve all servers and 10 queries for each server.
- A total of 20 models loaded.
- Memory usage: 2MB.
- Execution time: 38.19 ms.
- After eager loading
- 2 queries: 1 to get all servers and 1 to get all logs.
- A total of 10010 models loaded ????.
- Memory usage: 13MB (6.5x increase).
- Execution time: 66.5 ms (1.7x increase).
- Slower computational time due to loading all the models ????.
The tool in the screenshot is Debugbar.
Looks like we didn’t fix anything; in fact, we made it worse.. And keep in mind, this is a very simplified example. In a real world scenario, you can easily end up with hundreds or thousands of records, leading to the loading of millions of models.. The title makes sense now?
How do we truly solve this?
In our case, eager loading is a NO NO. Instead, we can use sub-queries and leverage the database to perform tasks it is built and optimized for.
$servers = Server::query()
->addSelect([
'last_activity' => Log::select('created_at')
->whereColumn('server_id', 'servers.id')
->latest()
->take(1)
])
->get();
This will result in a single query
select `servers`.*, (
select `created_at`
from `logs`
where
`server_id` = `servers`.`id`
order by `created_at` desc
limit 1
) as `last_activity`
from `servers`
Since the column we need from the relationship is now computed in a subquery, we have the best of both worlds: only 10 models loaded and minimal memory usage.
You might be thinking that with this approach comes a drawback: the last_activity
column is now a regular string. So, if you want to use the diffForHumans()
method, you’ll encounter the Call to a member function diffForHumans() on string
error. But no worries, you haven’t lost the casting; it’s as simple as adding a single line.
$servers = Server::query()
->addSelect([
'last_activity' => Log::select('created_at')
->whereColumn('server_id', 'servers.id')
->latest()
->take(1)
])
->withCasts(['last_activity' => 'datetime'])
->get();
By chaining the withCasts()
method, you can now treat the last_activity
as if it were a date.
How about the Laravel way?
The reddit community never disappoints! They have pointed out another alternative solution, a Laravel-ish approach; One Of Many.
Let’s define a new relationship to always retrieve the latest log
public function latestLog(): HasOne
{
return $this->hasOne(Log::class)->latestOfMany();
}
Now we can use the relationship like this
$servers = Server::query()
->with('latestLog')
->get();
This will result in the following queries
select * from `servers`
select `logs`.*
from
`logs`
inner join (
select MAX(`logs`.`id`) as `id_aggregate`, `logs`.`server_id`
from `logs`
where
`logs`.`server_id` in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
group by
`logs`.`server_id`
) as `latestOfMany`
on `latestOfMany`.`id_aggregate` = `logs`.`id`
and `latestOfMany`.`server_id` = `logs`.`server_id`
And it can be used in the Blade like this
@foreach ($servers as $server)
{{$server->latestLog }}
@endforeach
For a comparison between the two methods:
- Using subqueries
- 1 query.
- A total of 10 models loaded.
- Memory usage: 2MB.
- Execution time: 21.55 ms.
- Using the
latestOfMany()
- 2 queries
- A total of 20 models loaded.
- Memory usage: 2MB.
- Execution time: 20.63 ms
Both methods are really good; which one to use will depend on your case. If you absolutely need the child model hydrated and will make use of all its fields, go with the latestOfMany()
. However, if you only need a few fields, then the subquery will perform better. This is because, in the subquery, you select exactly what you need. Regardless of the number of records you have, the memory usage will be almost the same. Now, for the second method, memory usage is heavily dependent on the number of columns your table has. In reality, a table can easily have 50 columns, so hydrating the model will be expensive, even if it is only one per parent, that is to keep in mind when choosing!
Conclusion
I have seen some developers, by design, choose to force eager loading for all the models. You can’t just use it for everything, as much as it seems like you’ve solved the issue, you might have actually created a worse one. Not everything is a nail; the hammer might not work ????
Laravel News Links