Webhook for create failing 77% of the time [500 errors]

Hi Team - I followed the best practices via the webhooks guide but still seeing a 77% failure rate. Can someone please help me with the configuration?

Webhook Architecture: Comprehensive Overview

Our webhook system implements a robust, production-grade architecture for processing Shopify order events, following best practices throughout:

1. Entry Point: API Route Handler

import { NextApiRequest, NextApiResponse } from "next";
import { PrismaClient, Prisma } from "@prisma/client";
import { buffer } from "micro";
import { createHmac, timingSafeEqual } from "crypto";
import * as fs from "fs";
import * as path from "path";
import { 
  detectRecommendationType, 
  extractRecommendationSource, 
  formatLineItemProperties,
  extractPlatterProperties 
} from "../../../../utils/recommendation-utils";
import {
  isEventProcessed,
  markEventProcessed,
  processOrderAsync,
  logWebhookMetrics,
  ShopifyOrder
} from "../../../../utils/webhook-utils";
  • Uses Next.js API routes for seamless serverless architecture
  • Imports specialized utility modules for core functionality

2. Raw Body Processing & Middleware Configuration

// Disable the default body parser to get the raw body
export const config = {
  api: {
    bodyParser: false,
  },
};
  • Disables automatic body parsing to access raw request data
  • Essential for cryptographic signature verification

3. HMAC Verification for Security

function verifyWebhookSignature(
  body: string,
  hmacHeader: string | string[] | undefined
): boolean {
  if (!hmacHeader || Array.isArray(hmacHeader)) {
    return false;
  }

  const generatedHash = createHmac("sha256", SHOPIFY_API_SECRET)
    .update(body, "utf8")
    .digest("base64");

  return timingSafeEqual(
    Buffer.from(generatedHash),
    Buffer.from(hmacHeader)
  );
}
  • Implements Shopify’s recommended HMAC-SHA256 verification
  • Uses timing-safe comparison to prevent timing attacks

4. Main Handler with Error Boundary

export default async function handler(
  req: NextApiRequest,
  res: NextApiResponse
) {
  const startTime = Date.now();
  const topic = "orders/create";
  let shopDomain = "";
  let eventId = "";
  let success = false;
  
  // Only allow POST requests
  if (req.method !== "POST") {
    return res.status(405).json({ error: "Method not allowed" });
  }

  try {
    // Processing logic...
  } catch (error: unknown) {
    console.error("Error handling order created webhook:", error instanceof Error ? error.message : error);
    
    // Log metrics
    logWebhookMetrics(topic, shopDomain, eventId, false, startTime);
    
    // Even for errors, return 200 if we've parsed the body to prevent retries
    // Only return 500 for critical errors that should be retried
    return res.status(500).json({
      error: "Internal server error",
      message: error instanceof Error ? error.message : "Unknown error occurred",
    });
  }
}
  • Complete error boundary with detailed logging
  • Metrics tracking for all request outcomes

5. Idempotency through Deduplication

// Check if we've already processed this event
if (isEventProcessed(eventId)) {
  console.log(`Event ${eventId} already processed, acknowledging receipt`);
  logWebhookMetrics(topic, shopDomain, eventId, true, startTime);
  return res.status(200).json({ success: true, status: "already_processed" });
}
  • Prevents duplicate processing of the same event
  • Essential for reliability with Shopify’s retry mechanism

6. Diagnostic Logging

// Debug: Write the webhook payload to a file for inspection
try {
  const timestamp = new Date().toISOString().replace(/:/g, "-");
  const logDir = "./logs";
  
  // Create logs directory if it doesn't exist
  if (!fs.existsSync(logDir)) {
    fs.mkdirSync(logDir, { recursive: true });
  }
  
  // Write webhook details to file, but don't block the response
  fs.promises.writeFile(
    `${logDir}/specific-webhook-${timestamp}.json`, 
    JSON.stringify({
      headers: req.headers,
      body: order,
      hmacVerified: true,
      eventId
    }, null, 2)
  ).catch(err => console.error("Error writing webhook log:", err));
} catch (logError) {
  console.error("Error setting up webhook logging:", logError);
}
  • Non-blocking payload logging for debugging
  • Structured JSON format for easy analysis

7. Non-Blocking Processing Pattern

// IMPORTANT: Acknowledge receipt immediately with 200 status
// This tells Shopify we've received the webhook and prevents retries
success = true;
logWebhookMetrics(topic, shopDomain, eventId, success, startTime);

// Start processing the order asynchronously
setImmediate(() => {
  processOrderAsync(order, shopDomain, eventId, prisma)
    .catch(error => {
      console.error(`Async processing error for order ${order.id}:`, error);
    });
});

// Return success response immediately
return res.status(200).json({ success: true });
  • Implements the crucial “acknowledge-then-process” pattern
  • Uses Node.js setImmediate for optimal async execution

8. Intelligent Recommendation Analysis

export function detectRecommendationType(properties: Record<string, any>): string | null {
  // Multi-pattern detection logic for recommendation types
}

export function extractRecommendationSource(properties: Record<string, any>): string | null {
  // Advanced source extraction with fallback strategies
}

export function formatLineItemProperties(properties: any): Record<string, any> {
  // Robust normalization of Shopify's property formats
}
  • Sophisticated pattern matching for various recommendation formats
  • Flexible property normalization for Shopify’s evolving data structures

9. Performance Monitoring

export function logWebhookMetrics(
  topic: string,
  shopDomain: string,
  eventId: string,
  success: boolean,
  startTime: number
): void {
  const processingTime = Date.now() - startTime;
  console.log(`WEBHOOK_METRIC: ${topic}, shop: ${shopDomain}, eventId: ${eventId}, success: ${success}, time: ${processingTime}ms`);
}
  • Detailed performance metrics with timing information
  • Structured logging format for easy analysis and alerting

Hi Jlin,

In your error handling section you return the 500 on critical errors, so I’m assuming the errors originate from this path? Did you check the console log for clues? If that doesn’t give you what you need, I’d suggest adding more logging to isolate where exactly the error is thrown from in your processing logic.

A 77% failure rate is quite high, so let’s troubleshoot the issue together. Here are a few things to check:

  1. Webhook Endpoint Health & Response

Ensure your server responds with a 2XX status code within 5 seconds to avoid timeouts.

Check your server logs to see if there are patterns in the failures (timeouts, 500 errors, etc.).

  1. Webhook Retry Behavior

Shopify retries webhooks up to 19 times over 48 hours for non-2XX responses.

If failures are intermittent, there may be latency spikes on your server.

  1. Webhook Security & Verification

Ensure the HMAC signature validation is implemented correctly to prevent request rejections.

Check if firewall rules or rate limits might be blocking Shopify’s IPs.

  1. Webhook Load Handling

If you’re processing large order volumes, consider asynchronous processing with a message queue (e.g., RabbitMQ, AWS SQS).

If needed, increase server resources to handle the load.

Could you provide some more details on the failure patterns and error messages? That will help pinpoint the issue.

@jlin - as @Felix-Shopify points out, you need to understand the cause of the error.

  • Ingestion timeout: It sounds like it’s not this, as the error is a 500 response.
  • Verification failure: It could be this if the verification handling has a logic error.
  • Processing logic failure: most likely this.

One thing you could do to help with observability (even if it’s just temporarily to get to the bottom of this error) is put a service such as Hookdeck (who I work for) in between Shopify and your ingestion endpoint. Hookdeck will instantly ingest the event, handle Shopify verification, and allow you to control the delivery rate of the webhooks to your ingestion endpoint.

From this, you’ll see where the error is occurring. Though it seems likely it’s the ingestion logic. But it will rule out the verification handling being the problem.

If it’s from your ingestion endpoint, you can manually retry the failing webhook events and, via debug, the error in real-time. You can also “bookmark” failed events and replay them to an instance of your app on your localhost to further debug the problem.

I’ve been maintaining a Next.js boilerplate for a good while now

  1. Here’s my Webhook processor: /pages/api/webhooks/[...webhookTopic].js

  2. I write the webhook processor using my webhookWriter.js

  3. Here’s an example webhook app_uninstalled

Here’s what I do differently with my production apps:

  1. Next’s job is to process any webhook that’s thrown it’s way - and it does so async, purely because you’re in a serverless environment, which means you can just spin up functions and process webhooks in parallel. Add in a redis/kv store instance to ensure deduplication.
  2. You don’t have to manually run an HMAC check on webhooks, there’s already a shopify.webhooks.validate() function that does that for you and is very stable.
  3. Returning an immediate 200 is a bad idea. In the event you actually have a 500, it’s better to return a 500 to Shopify because after you deploy a fix, the webhook retry goes through normally so you don’t have to manually trigger that action.
  4. My understanding of your setup is there seems to be nothing wrong with how you’re managing webhooks (albeit over complicated), but somewhere in your functions where you’re breaking things. Perhaps migrate from using fs on Vercel to draining your log via a service like Logflare or Axiom.

Shopify retries webhooks up to 19 times over 48 hours for non-2XX responses.

There was an update on Sept 10, 2024 which changed the webhook retry mechanism, which now means that Shopify only retries a webhook 8 times over a period of 4 hours.

Source: Updates to webhook retry mechanism - Shopify developer changelog

1 Like