Improve cost calculation for metaobject fields

Hello,

I am creating a small ORM on top of metaobjects and metafields (GitHub - maestrooo/metaobject-repository: A simple ORM for manipulating Shopify metaobjects and metafields) to make it easier to interact with them. It works really well and I’m happy with the overall architecture, however I found a problem with the Shopify query cost calculation.

Right now, this library allows to populate one or multiple resources (even recursively). The library generates an optimize request. For instance this code:

const events = await eventRepository.findAll({ populate: ['author.image', 'products'] })

Will generate a GraphQL query that get the events with the products, the author and the author’s image. Internally, the library generate a query that looks like this:

{
   metaobjects(first: 50, type: "foo") {
      nodes {
         fields {
            name
            jsonValue
         }

         _author: field(key: "author") {
            reference {
               ...on Metaobject {
                  fields {
                    name
                    jsonValue
                  }

                  _image: field(key: "image") {
                     reference {
                         ...on MediaImage {
                            // properties
                         }
                     }
                  }
               }
            }
         }

         _products: field(key: "products") {
             references(first: 10) {
                nodes {
                    id

                    ...on Product {
                      // other properties
                    }
                }
             }
         }
      }
   }
}

This works really well and map very well to a recursive approach. There is however one issue: the calculated cost is way more expensive than doing a naive approach that will actually fetch way more data potentially.

This optimized query only fetches what we need: if the event object has other references, they are not fetches.

However, I’ve found that generating this query produces a cost that is way cheaper, while it potentially retrieves much more data:

{
   metaobjects(first: 50, type: "foo") {
      nodes {
         fields {
            name
            jsonValue

            reference {
               ...on Metaobject {
                  fields {
                    name
                    jsonValue

                    reference {
                       ...on MediaImage {
                         // properties
                       }
                    }
                  }
               }
            }

            references(first: 10) {
               ...on Product {
                  // fields
               }
            }
         }
      }
   }
}

The problem is that if the metafied of type foo has several fields that reference metaobjects or list of products, way more data will be retrieved (while we only want to populate the author). This is actually a problem when I serialize the data to our internal structure, as I might get references to objects the user did not ask in the inital query.

I think that internally, for the query cost, if the query already includes a “fields”, then individual field should be “merged” with the fields on, so that the resulting cost is the same.

1 Like

Hey @bakura10 - thanks for flagging this!

I tried to replicate on my end, but found that the more targeted query did return less cost than the naive one on my test shop.

I can’t speak to third party libraries, but if you’re able to share the exact queries you tested and the cost numbers you’re seeing, I can see if I’m able to replicate on our end and narrow down what might be happening here.

If you’re able to share the specific x-request-id values from the response headers for both of the examples above, that would be the best thing I can use to dig into this further. Also helpful would be knowing the approximate number of fields your metaobject type has, as that might affect the cost calculation.

Hope to hear from you soon!

Hi Alan. Thanks for the reply. Sure, you can just try the following random query:

query GetMetaobjects { 
  metaobjects(type: "$app:foo", first: 50, reverse: false) { 
    nodes { 
      id 
      fields {
        jsonValue
        
        reference {
          __typename
        }
        
        references(first: 50) {
          nodes {
            __typename
          }
        }
      }
    }
  } 
}

This gives you a requestedQueryCost of 86. If the metaobjects contain 40 fields that are all list_reference, this is therefore the cost you get.

Now, do the following optimized query that explicitly only get two references:

query GetMetaobjects { 
  metaobjects(type: "$app:foo", first: 50, reverse: false) { 
    nodes { 
      id
      
      foo: field(key: "foo") {
        references(first: 50) {
          nodes {
            __typename
          }
        }
      }
      
      bar: field(key: "bar") {
        references(first: 50) {
          nodes {
            __typename
          }
        }
      }
    }
  } 
}

And you get a requestedQueryCost of 149.

I’m aware thaat requestedQueryCost is very different from actualQueryCost. But, as explained in the documentation, the requestedQueryCost is used as well, to decide if the request can be done or not: Shopify API rate limits