Hi,
We’re trying to understand the performance impact of groupObjects on bulk operations at scale..think 500k products, millions of variants.
Shopify’s docs mention that grouped output slows down bulk operations and increases the likelihood of timeouts, but doesn’t quantify it.
A few questions:
- How significant is the performance difference between
groupObjects: true and groupObjects: false in practice, on large datasets?
- Does splitting the same query into two separate flat queries (
products only + productVariants at root level) perform similarly to a single query with groupObjects: false ; purely in terms of total processing speed, regardless of whether they run in parallel (since 2026-01 multiple bulk ops can run in parallel, but this is not relevant to my question)?
Case 1 — single query, groupObjects: true
mutation {
bulkOperationRunQuery(groupObjects: true, query: """
{
products {
edges {
node {
id
variants {
edges {
node { id }
}
}
}
}
}
}
""") { bulkOperation { id status } }
}
Case 2 — single query, groupObjects: false (default)
mutation {
bulkOperationRunQuery(query: """
{
products {
edges {
node {
id
variants {
edges {
node { id }
}
}
}
}
}
}
""") { bulkOperation { id status } }
}
Case 3 — two separate flat queries
# Query 1
{ products { edges { node { id } } } }
# Query 2
{ productVariants { edges { node { id product { id } } } } }
productVariants at root, 1 connection level each, no nesting.
Thanks
Hey @Soufiane_Ghzal! The biggest performance lever by far is groupObjects. When it’s true, two things happen that hurt you at scale. First, the query execution phase has additional retry logic with progressively smaller page sizes that can burn time. Second, and more importantly, the file assembly step has to download, parse, and re-sort the JSONL output line-by-line so child objects land directly after their parent. At 500k products with millions of variants, that sorting step is where timeouts happen. When groupObjects is false (the default on 2026-01+), file assembly is a straightforward concatenation without any content-level parsing.
On your Case 2 vs Case 3 question, the difference is smaller but does still exist. Nested queries use reduced pagination limits and require additional pagination work for child connections on each page of parent objects. A flat root-level query like productVariants avoids that entirely, getting the full page size with no extra overhead. At your scale that adds up, though it’s secondary to the groupObjects impact.
One thing to keep in mind with Case 3. Since each query is a standalone root-level connection, there’s no automatic __parentId linking in the JSONL. You’d rely on the product { id } field you’ve already included in your variant query to associate variants back to products, which is straightforward. In Case 2 with groupObjects: false, the JSONL does include __parentId automatically because it’s a nested connection. Either way the data is there, it’s just a different shape to parse.
So in practice, your best option is Case 3 with groupObjects omitted (defaults to false on 2026-01). Each flat query gets optimal pagination and the simplest file assembly path. You also get the concurrent execution benefit on 2026-01, even though I know you said that’s not the focus. The bulk operations guide covers the JSONL format details, and the 2026-01 changelog entry has more context on the default change.
1 Like
Hi @Donal-Shopify thanks a lot for the very detailed response.
That will help a lot choosing the right path going forward.
From what you’re saying, it’s still worth it to split queries and avoid nested connections where possible, so we’ll consider that one too.
Thanks!
1 Like
@Donal-Shopify I’d like to follow up on that one.
We’re trying to stop relying on objects grouping for our product bulk operations.
One problem we have is that we need to know which variants belong to a product at the time we fetch the product. Object grouping is a safe way to do that as it colocates the variants with the product. However, as discussed earlier, it’s scaling poorly.
I couldn’t find a way to simply pull the variant GUIDs attached to a product without using a nested connection to variants.
Is it something we can do in some ways?
Thank you.