I recently submitted for BFS and have just received a response from Shopify warning me of a critical issue with my app that if not resolved will result in delisting. After reviewing the logs of their testing, I am in a serious bind.
My app relies on users being able to upload files to an externally hosted bucket. We use pre-signed upload urls to achieve this. I can see from the logs the user received the correctly pre-signed upload url but due to their internal network restrictions, firewalls etc that they are not able to actually upload anything. The claim is that our app has a critical bug because uploads don’t work.
This is the first time we have seen this occur (after tens of thousands of uploads across hundreds of shops). I am unsure of how to respond to this. If a users network completely blocks uploads outside of the network then it is to be expected that uploading files within my application will fail. Errors were correctly provided to the users, but there is nothing I can do to resolve their issue.
I assume you’re using S3 or the equivalent for storage.
I can see from the logs the user received the correctly pre-signed upload url but due to their internal network restrictions, firewalls etc that they are not able to actually upload anything.
Can you share the actual stack trace that leads you to believe this is the cause?
It could be CORS, and expired pre-signed URL, or other issue.
If it truly is an issue of a blocked URL, have you considered using a subdomain to mask the S3 domain with your own domain so that it’s a trusted one rather than a generic S3 Bucket URL?
The root cause stack trace is not available as this issue is happening client side, however from my server side logs it can be determined with high certainty that the failure is occurring due to this. Their screencast (which include their browser logs) also points to this being the root cause.
I have confirmed that it is neither a CORS issue (almost entirely permissive) or expired pre-signed URL. I have implemented comprehensive testing of this issue and the only time when upload failures of this nature occurs is when a network restriction is put in place.
Unfortunately due to the above, despite the good idea of using a subdomain, if their network is being restricted as the evidence suggests then those uploads will fail too.
Why not have a server side endpoint that YOU take in the file(s), then your server sends that to the bucket instead of direct to bucket from the user’s end? Maybe I’m misunderstanding so apologies if I’ve got the flow wrong.
You are correct that this would work however many of our users upload files exceeding 50 GB. Routing these through our servers impacts our ability to provide a fast upload experience for our users, severely degrading the UX. Unfortunately we are are not able to programatically determine if a given user has network restrictions in place and so cannot provide the ability to use both upload streams.
Yeah, there are great ways to handle this. Consider a queue. Obviously I don’t have the full perspective of your app, but you can store it locally (briefly), then tell the user that you’re processing it in the background. Try also streaming the data. If you need that as a straight flow then that’s maybe more problematic, but I don’t see a different solution (for your current implementation) than you taking full control over how the files get to the bucket.
This is a completely fair point, it is conjecture based on what I am able to ascertain from the information that is available to me.
The browser console in the screencast does not show an explicit error indicating the file upload failed due to network restrictions, the user just gets an immediate upload failure error. I have now increased the logging throughout this section to try and capture all available information. As far as what I can see so far the XHR upload just silently gets blocked.
This “silent failure” pattern is exactly what you’d expect from a corporate proxy/firewall blocking outbound PUT requests - the connection gets dropped without returning any meaningful response.
Thank you for the 3 suggestions, I am looking into them now.
If it is CORS though you’d see that in either the Network tab or the Console tab.
Aside from that, the Network tab may be the biggest hint from the user’s side. You’ll get some sort of response code other than a 200 that should lead to what the problem really is. Anyway, hope that helps.
I am able to exactly replicate the error if I set up Network request blocking for my storage provider Cloudflare in Dev Tools.
CORS, R2 Bucket issues can be eliminated with near certainty. When simulating those issues, I consistently get 150-300ms preflight delay in error response whereas the Network request blocking was 1ms (which matches the 16 attempted uploads from the reviewer which received the errors after 2ms ± 1ms). This instant rejection before any connection is made also points in this direction.
Hopefully the reviewer appreciates this is due to their specific network setup. Lets see!
Quick update. The problem was caused due to their network blocking all external uploads. If anyone else runs into this issue in the future, please ask the reviewer to double check their network configuration!