Buy

Joining Across a ManyToMany + EXTRA_LAZY Fetch

On the genus list page, I want to add a new column that prints the number of scientists each Genus has. That should be simple!

Open the genus/list.html.twig template. Add the new th for number of scientists:

29 lines app/Resources/views/genus/list.html.twig
... lines 1 - 2
{% block body %}
<table class="table table-striped">
<thead>
<tr>
... lines 7 - 8
<th># of scientists</th>
... line 10
</tr>
</thead>
... lines 13 - 26
</table>
{% endblock %}

Then down below, add the td, then say {{ genus.genusScientists|length }}:

29 lines app/Resources/views/genus/list.html.twig
... lines 1 - 2
{% block body %}
<table class="table table-striped">
... lines 5 - 12
<tbody>
{% for genus in genuses %}
<tr>
... lines 16 - 21
<td>{{ genus.genusScientists|length }}</td>
... line 23
</tr>
{% endfor %}
</tbody>
</table>
{% endblock %}

In other words:

Go out and get my array of genus scientists and count them!

And, it even works! Each genus has three scientists. Until we delete one, then only two scientists! Yes!

The Lazy Collection Queries

But now click the Doctrine icon down in the web debug toolbar to see how the queries look on this page. This is really interesting: we have one query that's repeated many times: it selects all of the fields from user and then INNER JOINs over to genus_scientist WHERE genus_id equals 29, then, 25, 26 and 27.

When we query for the Genus, it does not automatically also go fetch all the related Users. Instead, at the moment that we access the genusScientists property, Doctrine queries all of the User data for that Genus. We're seeing that query for each row in the table.

Fetching EXTRA_LAZY

Technically, that's a lot of extra queries... which could impact performance. But please, don't hunt down potential performance problems too early - there are far too many good tools - like NewRelic and Blackfire - that are far better at identifying real performance issues later.

But, for the sake of learning... I want to do better, and there are a few possibilities! First, instead of querying for all the user data just so we can count the users, wouldn't it be better to make a super-fast COUNT query?

Yep! And there's an awesome way to do this. Open Genus and find the $genusScientists property. At the end of the ManyToMany, add fetch="EXTRA_LAZY":

197 lines src/AppBundle/Entity/Genus.php
... lines 1 - 14
class Genus
{
... lines 17 - 71
/**
* @ORM\ManyToMany(targetEntity="User", inversedBy="studiedGenuses", fetch="EXTRA_LAZY")
* @ORM\JoinTable(name="genus_scientist")
*/
private $genusScientists;
... lines 77 - 195
}

That's it. Now go back, refresh, and click to check out the queries. We still have the same number of queries, but each row's query is now just a simple count.

That's freaking awesome! Doctrine knows to do this because it realizes that all we're doing is counting the scientists. But, if we were to actually loop over the scientists and start accessing data on each User - like we do on the genus show page - then it would make a full query for all the User data. Doctrine is really smart.

Joining for Less Queries

Another way to optimize this would be to try to minimize the number of queries. Instead of running a query for every row, couldn't we grab all of this data at once? When we originally query for the genuses, what if we joined over to the user table then, and fetched all of the users immediately?

That's totally possible, and while it might actually be slower in this case, let's find out how to do join across a ManyToMany relationship. Open GenusController and find listAction(). Right now, this controller calls a findAllPublishOrderedByRecentlyActive() method on GenusRepository to make the query:

148 lines src/AppBundle/Controller/GenusController.php
... lines 1 - 13
class GenusController extends Controller
{
... lines 16 - 57
public function listAction()
{
... lines 60 - 61
$genuses = $em->getRepository('AppBundle:Genus')
->findAllPublishedOrderedByRecentlyActive();
... lines 64 - 67
}
... lines 69 - 146
}

Go find that method! Here's the goal: modify this query to join to the middle genus_scientist table and then join again to the user table so we can select all of the user data. But wait! What's the number one rule about ManyToMany relationships? That's right: you need to pretend like the middle join table doesn't exist.

Instead, leftJoin() directly to genus.genusScientists. Alias that to genusScientist:

26 lines src/AppBundle/Repository/GenusRepository.php
... lines 1 - 7
class GenusRepository extends EntityRepository
{
... lines 10 - 12
public function findAllPublishedOrderedByRecentlyActive()
{
return $this->createQueryBuilder('genus')
... lines 16 - 19
->leftJoin('genus.genusScientists', 'genusScientist')
... lines 21 - 23
}
}

When you JOIN in Doctrine, you always join on a relation property, like $genusScientists. Doctrine will automatically take care of joining across the middle table and then over to the user table.

To select the user data: addSelect('genusScientist'):

26 lines src/AppBundle/Repository/GenusRepository.php
... lines 1 - 7
class GenusRepository extends EntityRepository
{
... lines 10 - 12
public function findAllPublishedOrderedByRecentlyActive()
{
return $this->createQueryBuilder('genus')
... lines 16 - 19
->leftJoin('genus.genusScientists', 'genusScientist')
->addSelect('genusScientist')
... lines 22 - 23
}
}

Ok, go back and refresh again! Woh, one query! And that query contains a LEFT JOIN to genus_scientist and another to user. Because we're fetching all the user data in this query, Doctrine avoids making the COUNT queries later.

If Doctrine JOINS are still a bit new to you, give yourself a head start with our Doctrine Queries Tutorial.

Leave a comment!

  • 2017-05-08 jian su

    Hi Victor: Thank you. it makes sense now :)

  • 2017-05-08 Victor Bocharsky

    Hey Jian,

    Yes! JOINs always slows your queries, sometimes it will be enough to add indexes for columns by which you are joining, but sometimes it makes sense do not use JOINs at all and make another 2nd query. So the simple answer is yes, you pay for JOINs :)

    Cheers!

  • 2017-05-07 jian su

    Hi guys:

    When you join query
    ->leftJoin('genus.genusScientists', 'genusScientist')
    ->addSelect('genusScientist')
    is slower than no join. it took 66.68ms for a join query, no join with 3.56ms. I guess joining do pay some price?