Buy

Joining Across a ManyToMany + EXTRA_LAZY Fetch

On the genus list page, I want to add a new column that prints the number of scientists each Genus has. That should be simple!

Open the genus/list.html.twig template. Add the new th for number of scientists:

29 lines app/Resources/views/genus/list.html.twig
... lines 1 - 2
{% block body %}
<table class="table table-striped">
<thead>
<tr>
... lines 7 - 8
<th># of scientists</th>
... line 10
</tr>
</thead>
... lines 13 - 26
</table>
{% endblock %}

Then down below, add the td, then say {{ genus.genusScientists|length }}:

29 lines app/Resources/views/genus/list.html.twig
... lines 1 - 2
{% block body %}
<table class="table table-striped">
... lines 5 - 12
<tbody>
{% for genus in genuses %}
<tr>
... lines 16 - 21
<td>{{ genus.genusScientists|length }}</td>
... line 23
</tr>
{% endfor %}
</tbody>
</table>
{% endblock %}

In other words:

Go out and get my array of genus scientists and count them!

And, it even works! Each genus has three scientists. Until we delete one, then only two scientists! Yes!

The Lazy Collection Queries

But now click the Doctrine icon down in the web debug toolbar to see how the queries look on this page. This is really interesting: we have one query that's repeated many times: it selects all of the fields from user and then INNER JOINs over to genus_scientist WHERE genus_id equals 29, then, 25, 26 and 27.

When we query for the Genus, it does not automatically also go fetch all the related Users. Instead, at the moment that we access the genusScientists property, Doctrine queries all of the User data for that Genus. We're seeing that query for each row in the table.

Fetching EXTRA_LAZY

Technically, that's a lot of extra queries... which could impact performance. But please, don't hunt down potential performance problems too early - there are far too many good tools - like NewRelic and Blackfire - that are far better at identifying real performance issues later.

But, for the sake of learning... I want to do better, and there are a few possibilities! First, instead of querying for all the user data just so we can count the users, wouldn't it be better to make a super-fast COUNT query?

Yep! And there's an awesome way to do this. Open Genus and find the $genusScientists property. At the end of the ManyToMany, add fetch="EXTRA_LAZY":

197 lines src/AppBundle/Entity/Genus.php
... lines 1 - 14
class Genus
{
... lines 17 - 71
/**
* @ORM\ManyToMany(targetEntity="User", inversedBy="studiedGenuses", fetch="EXTRA_LAZY")
* @ORM\JoinTable(name="genus_scientist")
*/
private $genusScientists;
... lines 77 - 195
}

That's it. Now go back, refresh, and click to check out the queries. We still have the same number of queries, but each row's query is now just a simple count.

That's freaking awesome! Doctrine knows to do this because it realizes that all we're doing is counting the scientists. But, if we were to actually loop over the scientists and start accessing data on each User - like we do on the genus show page - then it would make a full query for all the User data. Doctrine is really smart.

Joining for Less Queries

Another way to optimize this would be to try to minimize the number of queries. Instead of running a query for every row, couldn't we grab all of this data at once? When we originally query for the genuses, what if we joined over to the user table then, and fetched all of the users immediately?

That's totally possible, and while it might actually be slower in this case, let's find out how to do join across a ManyToMany relationship. Open GenusController and find listAction(). Right now, this controller calls a findAllPublishOrderedByRecentlyActive() method on GenusRepository to make the query:

148 lines src/AppBundle/Controller/GenusController.php
... lines 1 - 13
class GenusController extends Controller
{
... lines 16 - 57
public function listAction()
{
... lines 60 - 61
$genuses = $em->getRepository('AppBundle:Genus')
->findAllPublishedOrderedByRecentlyActive();
... lines 64 - 67
}
... lines 69 - 146
}

Go find that method! Here's the goal: modify this query to join to the middle genus_scientist table and then join again to the user table so we can select all of the user data. But wait! What's the number one rule about ManyToMany relationships? That's right: you need to pretend like the middle join table doesn't exist.

Instead, leftJoin() directly to genus.genusScientists. Alias that to genusScientist:

26 lines src/AppBundle/Repository/GenusRepository.php
... lines 1 - 7
class GenusRepository extends EntityRepository
{
... lines 10 - 12
public function findAllPublishedOrderedByRecentlyActive()
{
return $this->createQueryBuilder('genus')
... lines 16 - 19
->leftJoin('genus.genusScientists', 'genusScientist')
... lines 21 - 23
}
}

When you JOIN in Doctrine, you always join on a relation property, like $genusScientists. Doctrine will automatically take care of joining across the middle table and then over to the user table.

To select the user data: addSelect('genusScientist'):

26 lines src/AppBundle/Repository/GenusRepository.php
... lines 1 - 7
class GenusRepository extends EntityRepository
{
... lines 10 - 12
public function findAllPublishedOrderedByRecentlyActive()
{
return $this->createQueryBuilder('genus')
... lines 16 - 19
->leftJoin('genus.genusScientists', 'genusScientist')
->addSelect('genusScientist')
... lines 22 - 23
}
}

Ok, go back and refresh again! Woh, one query! And that query contains a LEFT JOIN to genus_scientist and another to user. Because we're fetching all the user data in this query, Doctrine avoids making the COUNT queries later.

If Doctrine JOINS are still a bit new to you, give yourself a head start with our Doctrine Queries Tutorial.

Leave a comment!

  • 2017-10-06 axa

    Thanks Ryan for explanation.

  • 2017-10-04 weaverryan

    Yo axa!

    Yea, this is a really interesting question. Because, to me, fetch="EXTRA_LAZY" basically seems like the *best* value... in pretty much all cases! There is one case where it is not ideal: if you count the number of items in a collection relationship and then loop over it, having EXTRA_LAZY will cause you to have 2 queries instead of 1 (a count query, and then another query to loop over them). Because of that, I would use this rule: if you know that you want to count the items in a relationship sometimes, but NOT actually loop over them, use EXTRA_LAZY.

    And when to use joins? Well, that solves a different problem. EXTRA_LAZY gives us a *really* simple way of counting the items in a collection in the most efficient way (a simple COUNT query). If you use a fetch join, then you will grab all the data at once. If you then simply ONLY count the collection, well, then EXTRA_LAZY doesn't help you: you already fetched all the data... even though you didn't really need it. But if you *will* loop over the collection, then of course doing a fetch join is always a little bit faster. But honestly, I usually write my queries a bit sloppy, and then optimize later with Blackfire.io :). I do often find that a page is slow because I'm querying for a lot of data, but I much less often find that the fix is a join.

    Cheers!

  • 2017-10-04 axa

    Hello,
    About < fetch="EXTRA_LAZY" >, something is not clear for me.
    Should we use this always (when we need joins)? When we should avoid this? For example, if you put on a property, you will use this property many times in different parts of app. Is this fine? So, when to use it (or better to add joins instead of it) ?

  • 2017-05-08 jian su

    Hi Victor: Thank you. it makes sense now :)

  • 2017-05-08 Victor Bocharsky

    Hey Jian,

    Yes! JOINs always slows your queries, sometimes it will be enough to add indexes for columns by which you are joining, but sometimes it makes sense do not use JOINs at all and make another 2nd query. So the simple answer is yes, you pay for JOINs :)

    Cheers!

  • 2017-05-07 jian su

    Hi guys:

    When you join query
    ->leftJoin('genus.genusScientists', 'genusScientist')
    ->addSelect('genusScientist')
    is slower than no join. it took 66.68ms for a join query, no join with 3.56ms. I guess joining do pay some price?