Improve cost calculation for metaobject fields

Hello,

I am creating a small ORM on top of metaobjects and metafields (GitHub - maestrooo/metaobject-repository: A simple ORM for manipulating Shopify metaobjects and metafields) to make it easier to interact with them. It works really well and I’m happy with the overall architecture, however I found a problem with the Shopify query cost calculation.

Right now, this library allows to populate one or multiple resources (even recursively). The library generates an optimize request. For instance this code:

const events = await eventRepository.findAll({ populate: ['author.image', 'products'] })

Will generate a GraphQL query that get the events with the products, the author and the author’s image. Internally, the library generate a query that looks like this:

{
   metaobjects(first: 50, type: "foo") {
      nodes {
         fields {
            name
            jsonValue
         }

         _author: field(key: "author") {
            reference {
               ...on Metaobject {
                  fields {
                    name
                    jsonValue
                  }

                  _image: field(key: "image") {
                     reference {
                         ...on MediaImage {
                            // properties
                         }
                     }
                  }
               }
            }
         }

         _products: field(key: "products") {
             references(first: 10) {
                nodes {
                    id

                    ...on Product {
                      // other properties
                    }
                }
             }
         }
      }
   }
}

This works really well and map very well to a recursive approach. There is however one issue: the calculated cost is way more expensive than doing a naive approach that will actually fetch way more data potentially.

This optimized query only fetches what we need: if the event object has other references, they are not fetches.

However, I’ve found that generating this query produces a cost that is way cheaper, while it potentially retrieves much more data:

{
   metaobjects(first: 50, type: "foo") {
      nodes {
         fields {
            name
            jsonValue

            reference {
               ...on Metaobject {
                  fields {
                    name
                    jsonValue

                    reference {
                       ...on MediaImage {
                         // properties
                       }
                    }
                  }
               }
            }

            references(first: 10) {
               ...on Product {
                  // fields
               }
            }
         }
      }
   }
}

The problem is that if the metafied of type foo has several fields that reference metaobjects or list of products, way more data will be retrieved (while we only want to populate the author). This is actually a problem when I serialize the data to our internal structure, as I might get references to objects the user did not ask in the inital query.

I think that internally, for the query cost, if the query already includes a “fields”, then individual field should be “merged” with the fields on, so that the resulting cost is the same.

2 Likes

Hey @bakura10 - thanks for flagging this!

I tried to replicate on my end, but found that the more targeted query did return less cost than the naive one on my test shop.

I can’t speak to third party libraries, but if you’re able to share the exact queries you tested and the cost numbers you’re seeing, I can see if I’m able to replicate on our end and narrow down what might be happening here.

If you’re able to share the specific x-request-id values from the response headers for both of the examples above, that would be the best thing I can use to dig into this further. Also helpful would be knowing the approximate number of fields your metaobject type has, as that might affect the cost calculation.

Hope to hear from you soon!

Hi Alan. Thanks for the reply. Sure, you can just try the following random query:

query GetMetaobjects { 
  metaobjects(type: "$app:foo", first: 50, reverse: false) { 
    nodes { 
      id 
      fields {
        jsonValue
        
        reference {
          __typename
        }
        
        references(first: 50) {
          nodes {
            __typename
          }
        }
      }
    }
  } 
}

This gives you a requestedQueryCost of 86. If the metaobjects contain 40 fields that are all list_reference, this is therefore the cost you get.

Now, do the following optimized query that explicitly only get two references:

query GetMetaobjects { 
  metaobjects(type: "$app:foo", first: 50, reverse: false) { 
    nodes { 
      id
      
      foo: field(key: "foo") {
        references(first: 50) {
          nodes {
            __typename
          }
        }
      }
      
      bar: field(key: "bar") {
        references(first: 50) {
          nodes {
            __typename
          }
        }
      }
    }
  } 
}

And you get a requestedQueryCost of 149.

I’m aware thaat requestedQueryCost is very different from actualQueryCost. But, as explained in the documentation, the requestedQueryCost is used as well, to decide if the request can be done or not: Shopify API rate limits

Hi @bakura10 :waving_hand: - thanks for sharing this and apologies for the wait on my reply here!

I was able to replicate the issue on my end here. I do think what you’re saying makes sense about the targeted query. I’ll do some further digging internally and loop back with you here when I have some more info - appreciate your followup!

Hey @bakura10 :waving_hand: - I was able to look a bit more deeply into this on my end and can confirm this is actually expected behaviour. It does seem a little unintuitive, but essentially, our GraphQL query costs are calculated based on the number of underlying database queries needed, not the amount of data returned.

The first query using fields is cheaper (cost 86) because it executes as a single batched query, while the “optimized” query using multiple field calls is more expensive (cost 149) because each field(key: "...") requires a separate database query.

This cost difference compounds when fetching further objects down the line (like references), as the batched approach can fetch all references together while the targeted approach needs separate queries for each field’s references.

It’s not the most intuitive at first glance, but it’s based on the Ruby integration of GQL’s bias toward “fair” query costing. There’s a bit more info on this here: GraphQL - Complexity & Depth

Hope this helps/makes sense - let me know if I can clarify anything further on my end here!

Hello,

Thanks a lot!

1 Like

@Alan_G I gave it a second thought and I still think this is somehow strange. If you look at the documentation here: Standard product review metaobject definition Shopify example shows an “optimized” way where they query only the field they need. However, if I understand this correctly, this will actually be a much more expensive query than just doing fields { value } ? What I find it strange is that if you do a fields { value } that potentially retrieve 40 fields, will be, cost-wise, less expensive than a query that only get two values through the key?

@Alan_G I can confirm this is problematic. Even if you just get scalar, a single query like this will already have a cost of 30:

query GetMetaobjects { 
  metaobjects(type: "$app:foo", first: 50, reverse: false) { 
    nodes { 
      id
      foo: field(key: "bar") {
        value
      }
      bar: field(key: "baz") {
        value
      }
      bam: field(key: "bam") {
        value
      }
    }
  } 
}

The Shopify example where all fields are retrieved (although they are all scalar) easily exceed a cost of 100.

Hey @bakura10 :waving_hand: – Hopefully I can clarify things here!

Each explicit field(key: "...") is its own resolver call, so a few of them (plus their nested references) add up to more internal queries on our end and a higher requestedQueryCost, even if the actual data response is technically smaller.

My understanding is that this is expected behaviour under the current model there. If you need to stay inside the less expensive cost bucket you may want to keep the broad fields call and filter client-side, or split the fetch into two passes. Happy to dig deeper if needed or if I’m misunderstanding - just let me know!

Hello again @Alan_G ,

This is quite odd to me, because field(key: "...") can refer to scalar values. I suppose (hopefully) that internally you want do 5 requests if I ask 5 fields that are just a string, isn’t ? For me this is equivalent here to asking a scalar field in a product (let’s say title and description).

The thing that is a bit annoying with fields is that it is hard to get typing in GraphQL code-gen.

Imagine this request:

const GetMetaobjectQuery = `#graphql
  query GetMetaobject($id: ID!) {
    metaobject(id: $id) {
      id
      displayName
      fields {
        key
        jsonValue
      }
    }
  }
`

GraphQL codegen will generate a type but that can’t be typed so it becomes a bit hard to use.

Hey @bakura10 - thanks for touching base again, you raise a valid point for sure.

My understanding is that with metafields/objects specifically, we built them to be dynamic and flexible (so that they can be added without modifying the underlying schema), but it does come with some limitations.

At the moment, I can confirm this is more of a limitation than a bug, but I definitely do get where you’re coming from. In the meantime, one approach some folks have used is creating wrapper functions to transform the fields array into typed objects on their client side although this wouldn’t impact the query cost and I realize that’s not super ideal.

For next steps on our end, I can definitely put through a feature request for you - did you have any particular use cases you were looking at that a more robust/performant approach to the queries would help out with? Just trying to advocate as best I can here - speak soon!