Which is faster: whereHas vs join
Sep 28, 2022
I have heard multiple times that whereHas
queries are slower than join
queries, so I decided to test it to figure out if I should stop writing whereHas
queries once and for all.
So let me show you what I learned from my "experiment" and when you should consider changing your whereHas
queries to join
.
The setup
First I needed a relationship between two models to test the speed of the queries. I decided to go with products and variants (think color, size, etc). So where are the models:
class Product extends Model
{
public function variants()
{
return $this->hasMany(Variant::class);
}
}
class Variant extends Model
{
public function product()
{
return $this->belongsTo(Product::class);
}
}
And here are the migrations:
return new class extends Migration
{
public function up()
{
Schema::create('products', function (Blueprint $table) {
$table->id();
$table->string('name');
$table->text('description');
$table->integer('price');
$table->timestamps();
});
}
}
return new class extends Migration
{
public function up()
{
Schema::create('variants', function (Blueprint $table) {
$table->id();
$table->foreignId('product_id');
$table->string('name');
$table->timestamps();
});
}
}
As you can see, we have Product
model and products have many Variants
. Simple.
Seeding the data
Since I had a small suspicion that the way the data is distributed in the database would affect the results (more on that later) I decided to add 5000 products with one variant and 5000 without variants. So 10,000 products in total, half of them have variants and half of them don't.
class DatabaseSeeder extends Seeder
{
public function run()
{
// with variant
Product::factory(5000)->has(Variant::factory())->create();
// without variant
Product::factory(5000)->create();
}
}
Comparing speeds
The first query I tested was the simplest one, a whereHas
without condition and the equivalent join
. After each query you can see the time they took to run in milliseconds:
// whereHas
Product::whereHas('variants')->get(); // 12.75ms
// join
Product::select('products.*')
->join('variants', 'variants.product_id', '=', 'products.id')
->get(); // 18.88ms
For this simple use case, the whereHas
was a little faster than the join
, which actually surprised me, as I said before I have heard that joins are faster than whereHas
, and even though it wasn't by much, the whereHas
was faster here, so that means that whereHas
is not the slowest option in all cases.
For the second test, I decided to test filtering the products that have a specific variant, using the variant id. Here are the results:
// whereHas
Product::whereHas('variants', function ($query) {
$query->where('id', 1);
})->get(); // 0.48ms
// join
Product::select('products.*')
->join('variants', function ($query) {
$query->on('variants.product_id', '=', 'products.id')
->where('variants.id', 1);
})->get(); // 0.23ms
Both were pretty fast in this case, but the join
was faster. This made me realize something, if you know how whereHas
works under the hood, it creates a exists
query to fetch the results:
select * from `products` where exists (select * from `variants` where `products`.`id` = `variants`.`product_id` and `id` = 1)
Inside this exists
query there is another select query. After seeing the two first tests I realized that the whereHas
performance is heavily affected by how fast that second select can find a row that matches the conditions.
With this theory in mind, I created another test, similar to the previous one, but instead of using the id
I decided to use the name
of the variant, since look ups by id are faster than look ups using a non-indexed column. Here are the queries with the results:
$variant = Variant::first(); // Use this variant to search by name
// whereHas
Product::whereHas('variants', function ($query) use ($variant) {
$query->where('name', $variant->name);
})->get(); // 11.46ms
// join
Product::select('products.*')
->join('variants', function ($query) use ($variant){
$query->on('variants.product_id', '=', 'products.id')
->where('variants.name', $variant->name);
})->get(); // 1.89ms
The whereHas
took 11.46ms, compared to the 1.89ms from the join
query, which is a ton faster. Using a non-indexed column made the whereHas
a lot slower but the join
was only slightly affected in comparison.
What would happen if we add an index to the name
column in the variants table?
public function up()
{
Schema::table('variants', function (Blueprint $table) {
$table->index('name');
});
}
Here are the results after adding the index:
$variant = Variant::first(); // Use this variant to search by name
// whereHas
Product::whereHas('variants', function ($query) use ($variant) {
$query->where('name', $variant->name);
})->get(); // 1.02ms
// join
Product::select('products.*')
->join('variants', function ($query) use ($variant){
$query->on('variants.product_id', '=', 'products.id')
->where('variants.name', $variant->name);
})->get(); // 0.76ms
The whereHas
got a lot faster, while the join
remained almost at the same speed. Which is interesting.
Changing the data distribution
So, we figured out that the whereHas
performance is heavenly reliant on the select
inside the exists
clause.
So I wanted to test, what would be the speeds if it's very uncommon for a Product
to have variants, so I cleaned my database and ran this seeder:
class DatabaseSeeder extends Seeder
{
public function run()
{
// with variant
Product::factory(1)->has(Variant::factory())->create();
// without variant
Product::factory(9999)->create();
}
}
Only 1 out of 10,000 products has a variant. Here are the speeds for the join
and the whereHas
queries using this dataset:
// whereHas
Product::whereHas('variants')->get(); // 0.30ms
// join
Product::select('products.*')
->join('variants', 'variants.product_id', '=', 'products.id')
->get(); // 0.14ms
This time, the join
is slightly faster than the whereHas
. So the other way around than when half of the products had variants. Interesting.
So what about the other way around, what if out of 10,000 products only 1 didn't have a variant?
Again I cleaned my database and ran this seeder:
class DatabaseSeeder extends Seeder
{
public function run()
{
// with variant
Product::factory(9999)->has(Variant::factory())->create();
// without variant
Product::factory(1)->create();
}
}
And here are the results for the queries using that dataset:
// whereHas
Product::whereHas('variants')->get(); // 26.79ms
// join
Product::select('products.*')
->join('variants', 'variants.product_id', '=', 'products.id')
->get(); // 36.57ms
We are back to where we started, the whereHas
is faster than the join
. This leads me to believe that the whereHas
is faster if the condition is more common than not.
That also explains why in the other queries using the 'id' and 'name' the whereHas
was slower, because most products didn't have that specific variant.
So which is better?
As our tests indicate, it depends, if the conditions you are using in your whereHas
are more common than not, then it will probably perform better than the equivalent join
.
If that is not the case (ie. You have conditions that make the result set very small) then the join
will probably perform better, in some cases a lot better as we saw with our non-indexed name
test.
But performance is probably not the only thing we need to take into account when writing queries. In my opinion join
queries are a lot less readable than using whereHas
because their intended use is to select columns from another table, not to filter results, even if they can be used for that end.
So I would keep using whereHas
for the readability if I'm working with a dataset where the results will fetch most of the rows on my target model or if the amount of records in the database is small enough that the impact of changing that query to a join
would be minimal.
On the other hand, I would use a join
if I'm working with a big database and the conditions would retrieve only a small subset of the records in my database.
Conclusion
So should you change all your whereHas
queries for join
queries? The answer is no, both have advantages and disadvantages and should be used depending on the specifics of the situation.
Let me know if you liked this article or have any comments or questions by reaching out on twitter, I'm pretty active there :).
You might also like