Webhook volume and updated fields

Hi,

I wanted to share a challenge we’re running into with product update webhooks and get some input from the team.

Context:

we have an app that reacts to product changes. We need to track product data, variant data, inventory, and metafields on both products and variants. Because of that, we can’t use the built-in de-duplication. We have to listen to every update by subscribing to the update date.

The problem

The problem is that the webhook doesn’t tell us what actually changed. So every time we receive one, we have to pull the full product data from the GraphQL API to figure out if anything relevant happened. For merchants with complex catalogs: many metafields, or some with 2000+ variants on a single product, this gets slow and expensive. Even for simpler catalog, the volume makes it difficult to sustain. We end up fetching a lot of data just to find out nothing meaningful changed.

The thing is, everyone ends up paying for those no-op cycles:

  • Us as an app provider: extra server cost fetching data we didn’t need
  • Shopify: unnecessary API requests that could have been avoided
  • The merchant: slower updates, even when nothing actually changed for them

Our estimate is that if we knew what triggered the update (even just which fields changed), we could skip more than 90% of those API calls.

This also connects to pricing. Webhook volume ends up being our biggest infrastructure cost, but it’s not something merchants control directly. Other apps they install can trigger a constant stream of updates. So we can’t (and don’t want to) bill based on that, and we also can’t predict it upfront for a new merchant.

Is there any ongoing work or discussion around including change context in the webhook payload (e.g. which fields were updated)? Or any recommended pattern for filtering no-op webhooks without having to call the API every time that would work when we need nested data from variants, metafields, inventory, etc..?

Thanks in advance

1 Like

Hey @Soufiane_Ghzal , this is a known limitation and something we are aware of. I don’t have a specific timeline to offer at this time (keep an eye on the developers changelog for any changes here), but I will be sure to pass on your feedback as well.

To reduce unnecessary API calls do you currently do local state caching & hashing? This won’t eliminate all redundant fetches, but it could help batch and deduplicate rapid-fire updates. Or you could consider implementing a short debounce window (e.g., 5-10 seconds) per product ID. If multiple updates come in for the same product within that window, you only fetch once.

@KyleG-Shopify @HookdeckGareth

Thanks for your answers.

We already have some debouncing, otherwise our servers wouldn’t survive. But some clients with 100k+ products will suddenly update their whole catalog once a day from external synchronization tools. Even deboucing does not help here as we have to fetch these 100k+ products to make sure we haven’t missed an update. Effectively hammering Shopify servers with volume in return.

Hey - we’re in a similar boat with product update webhooks. They’re really noisy without providing an easy way to see what updated.

We’ve thought about moving to a less-than-realtime approach of using bulkQueries to run every half hour to hour to pull from the brand’s shopify instance and then bulkMutate to write to the destination. Just throwing that out that out as a possible alternative to webhooks.

Hi @APIConsumer

Thanks for sharing. That a good compromise in some use case, but for our use case we need to catch every single live updates and react to them quickly. Also, running a full bulk fetch on the store would work on small stores. But we have large stores with complex catalog for which running a bulk operation on their catalog would take hours to run at best, at worst we noticed that it crashes after a few minutes due to some issues that shopify is facing with complex catalogs.