Help with Using Shopify Bulk Operations API to Validate Product SKUs

Title: Help with Using Shopify Bulk Operations API to Validate Product SKUs

Body:

Hi everyone,

I’m working on a project where I need to validate a list of SKUs (stored in a CSV file) against the SKUs present in my Shopify store. My goal is to determine which SKUs from the CSV file are present in the store and which ones are missing. Here’s what I’ve done so far and the issues I’m encountering:


What I’m Trying to Achieve

  1. Input: I have a CSV file (skus.csv) containing a list of SKUs to validate.
  2. Goal:
    • Fetch all SKUs from my Shopify store using the Bulk Operations API.
    • Compare the store SKUs with the ones in the CSV file.
    • Generate a report (sku_comparison_results.csv) showing which SKUs are present or missing.

What I’ve Done So Far

  1. I set up a script to use Shopify’s Bulk Operations API to fetch all products and their variants, including SKUs.
  2. The bulk operation completes successfully, and I get a downloadable URL for a JSONL file containing product and variant data.
  3. I process the JSONL file to extract SKUs and save them into a CSV file (store_skus.csv).
  4. I wrote another script to compare the SKUs in store_skus.csv with those in my skus.csv.

The Issue

Despite the bulk operation completing successfully, all SKUs from my input CSV file are marked as “Missing,” even though I manually verified that some of them exist in the store.

I suspect one of the following could be causing the issue:

  • Data Fetching: My query might not be correctly fetching all SKUs from the store. For example, unpublished products, empty SKUs, or paginated results might be missed.
  • Comparison Logic: There could be issues with whitespace, case sensitivity, or formatting when comparing SKUs from the two files.

What I Need Help With

  1. Ensuring the Bulk Operations API fetches all SKUs correctly.
    • Is my GraphQL query missing any important fields?
    • Could unpublished or hidden products/variants be excluded?
  2. Handling data comparison more reliably:
    • Should I preprocess SKUs (e.g., trim spaces, standardize case) before comparing?
  3. Suggestions for optimizing this process:
    • Is there a better way to validate thousands of SKUs efficiently?

My Setup

  • Using Python to interact with Shopify’s API.

  • The bulk operation uses this GraphQL query:

    mutation {
      bulkOperationRunQuery(
        query: "{ products(first: 100000) { edges { node { variants(first: 100) { edges { node { sku } } } } } } }"
      ) {
        bulkOperation {
          id
          status
        }
        userErrors {
          field
          message
        }
      }
    }
    
  • After the operation completes, I download the JSONL file and parse it to extract SKUs.


Any Advice or Guidance?

If you’ve worked with the Bulk Operations API or have experience validating large lists of SKUs, I’d greatly appreciate your insights. How can I improve my process or troubleshoot the issue more effectively?

Thanks in advance for your help!

You need to remove the first parameters in your query, you can then remove the () as well.

In terms of comparison you should always trim the data of whitespace either side of the data, given user inputs.
If you’ve manually verified the SKUs, you should be able to manually compare them and validate if your code is correct

Hey, thanks for the help!

I updated the script as suggested:
:white_check_mark: Fixed the GraphQL query (removed unnecessary first: 100000 and () from products).
:white_check_mark: Trimmed SKUs for whitespace before comparison.
:white_check_mark: Fetching all SKUs using the Bulk Operations API and saving them to store_skus.csv.

However, the issue still persists:

  • The bulk operation completes successfully, and Shopify provides a JSONL file download link.
  • But the JSONL file is empty—no SKUs are retrieved.
  • Since the JSONL file contains no data, store_skus.csv (which is generated from the JSONL file) is also empty.
  • I manually verified that these SKUs exist when searching for the products in the Shopify Admin.

What I Need Help With

  • Why does the JSONL file contain no SKUs despite the bulk operation succeeding?
  • Is my GraphQL query structured correctly, or is there something missing?

Here’s my latest full script:


Latest Full Script (Fetching and Comparing SKUs)

import requests
import time
import pandas as pd

SHOPIFY_STORE_DOMAIN = "STORE.myshopify.com"
ACCESS_TOKEN = "access-token"
API_URL = f"https://{SHOPIFY_STORE_DOMAIN}/admin/api/2023-01/graphql.json"

### --- Step 1: Fetch SKUs from Shopify --- ###
def start_bulk_operation():
    headers = {
        "Content-Type": "application/json",
        "X-Shopify-Access-Token": ACCESS_TOKEN,
    }
    query = {
        "query": """
        mutation {
          bulkOperationRunQuery(
            query: "{ products { edges { node { id title variants { edges { node { id sku } } } } } } }"
          ) {
            bulkOperation {
              id
              status
            }
            userErrors {
              field
              message
            }
          }
        }
        """
    }
    response = requests.post(API_URL, json=query, headers=headers)
    return response.json()

def get_bulk_operation_status():
    headers = {
        "Content-Type": "application/json",
        "X-Shopify-Access-Token": ACCESS_TOKEN,
    }
    query = {
        "query": """
        {
          currentBulkOperation {
            id
            status
            url
          }
        }
        """
    }
    while True:
        response = requests.post(API_URL, json=query, headers=headers)
        data = response.json()
        status = data["data"]["currentBulkOperation"]["status"]
        print(f"Bulk operation status: {status}")
        if status == "COMPLETED":
            return data["data"]["currentBulkOperation"]["url"]
        elif status == "FAILED":
            raise Exception("Bulk operation failed.")
        time.sleep(10)

def download_skus(url):
    response = requests.get(url)
    skus = []
    for line in response.text.splitlines():
        try:
            product = eval(line)  # Convert JSONL string to dictionary
            for variant in product.get("variants", {}).get("edges", []):
                sku = variant["node"].get("sku", "").strip().upper()
                if sku:
                    skus.append(sku)
        except Exception as e:
            print(f"Error processing line: {e}")
    return skus

def fetch_store_skus():
    print("Starting bulk operation to fetch all SKUs...")
    response = start_bulk_operation()
    print("Bulk operation started:", response)

    print("Polling for bulk operation status...")
    results_url = get_bulk_operation_status()

    print("Downloading SKUs from JSONL file...")
    skus = download_skus(results_url)

    if not skus:
        print("⚠️ No SKUs were retrieved. The JSONL file appears to be empty.")

    output_file = "store_skus.csv"
    pd.DataFrame({"SKU": skus}).to_csv(output_file, index=False)
    print(f"All store SKUs saved to {output_file}")

### --- Step 2: Compare SKUs --- ###
def compare_skus():
    store_skus_file = "store_skus.csv"
    input_skus_file = "skus.csv"
    output_file = "sku_comparison_results.csv"

    try:
        store_skus = pd.read_csv(store_skus_file)["SKU"].astype(str).str.strip().str.upper().tolist()
    except Exception as e:
        print(f"⚠️ Error reading store SKUs file: {e}")
        store_skus = []

    try:
        input_skus = pd.read_csv(input_skus_file)["SKU"].astype(str).str.strip().str.upper().tolist()
    except Exception as e:
        print(f"⚠️ Error reading input SKUs file: {e}")
        input_skus = []

    results = [{"SKU": sku, "Status": "Present" if sku in store_skus else "Missing"} for sku in input_skus]

    pd.DataFrame(results).to_csv(output_file, index=False)
    print(f"Comparison results saved to {output_file}")

### --- Run Process --- ###
if __name__ == "__main__":
    fetch_store_skus()
    compare_skus()

Summary of the Issue

:white_check_mark: The bulk operation completes successfully and provides a CSV.
:x: But the CSV file is empty—no SKUs are retrieved.
:warning: Since the CSV file has no data, store_skus.csv is empty, which makes all SKUs appear as “Missing” in the final report.
:question: Why is this happening, and how can I fix it?

Would really appreciate any insights! Thanks in advance. :pray:

If you download the JSONL file yourself is it empty?

From the look of your code, your parsing of the JSONL file isn’t quite right.
You’ll get one line for the product data and then one line for each variant, you will then need to build up the structure you want.

There is an example of nearly your exact scenario here and what the data will look like and how to map it together.

I would recommend having a read and understanding that whole guide :slight_smile:

Thanks so much for all the help and recommendations! I really appreciate the guidance.

After updating the script to properly parse the JSONL file line by line and correctly extract SKUs from variants, everything is working as expected now.

Appreciate the support! :rocket: