Could you please share more details on how web filters are filtered out? Because I can find indexed URL’s on Google (inurl:collections inurl:phcursor), and will run into them when running local crawls using screamingfrog or other cloud-based crawlers like ahrefs.
I also encountered duplicate URLs being indexed on Google with the phcursor parameter, this should be adressed.
All the following code is not supported by Shopify, add it to your own risk.
Also, the matter is recent and might evolve in the future !
As a preventive measure, if your theme manage pagination using the part Liquid objects, you can simply add the following filter to the pagination URL drops:
{% # Regular %}
{% capture part_markup %}
{{ part.title | link_to: part.url }}
{% endcapture %}
{% # With fix %}
{% capture part_markup %}
{% assign part_url = part.url | split: '&phcursor=' | first %}
{{ part.title | link_to: part_url }}
{% endcapture %}
{% for part in paginate.parts %}
{{ part_markup }}
{% endfor %}
If your theme uses the default_pagination filter (i.e. {{ paginate | default_pagination }}), there are no easy ways to prevent the phcursor to be removed from the generated markup. Consider switching to a part based snippet for pagination if you need to squeeze out the phcursor URL parameter removed.
Google does crawl the parameterized URLs. While the canonical tag (which points to the parameter-less version) tells Google which URL should be indexed, it’s ultimately just a hint — Google is not obligated to follow it.
Any updates on this? The amount of indexed pages just doubled because of this..