Weird nested URLs that are not filters

Hi,
My client’s website creates nested collections urls /collections/perte-de-cheveux/loreal like this while filters show like this /collections/perte-de-cheveux?filter.p.vendor=L%27Or%C3%A9al

I am not able to identify what creates the nested urls for collections.

Any idea?

Thank you!

1 Like

Hey @Alexandre_Laurin

It’s a code thing where it seems the theme dev was not consistent of either using using {{ product.url | within: collection }} or simply {{ product.url }}.

I think it’s generally mostly accepted to stick to the simple URL. But I’m no SEO expert.

Sometimes the themes leave a setting in Theme Settings where this can be toggled. So try look for that.

EDIT: Liquid filters: within

I think it is related to collection tags. Here is some dummy data for an example:

{
  "products": {
    "shampoo": {
      "tags": ["shampoo", "loreal"]
    },
    "hair-conditioner": {
      "tags": ["hair-conditioner", "hair-loss", "loreal"]
    }
  }
}
  • If I go to the /collections/all URL, both products will be displayed.
  • If I go to the /collections/all/hair-loss URL (your case), only the hair-conditioner product will be displayed. The shampoo product does not have the hair-loss tag and is filtered out as a result :wink:
  • If I go to the /collectiosn/all/loreal URL, again, both products are displayed because both of them have the associated tag.

More info in the docs, hope it helps !

1 Like

Oh lol you just made me read the post again - ignore my comment. It’s totally unrelated :sweat_smile:

Listen to @teamdijon

Those links are normal and autogenerated by tags, as he mentioned. Seems like they also are not indexed anyways. Try googling site:<domain>/collections/<collection>/<tag>.

1 Like

Those urls are indexed and since they don’t hit any 404 they stay alive!

In this case, you need to update the contents of the /templates/robots.txt.liquid file to prevent bots crawling the URLs:

To add the custom rule, go to the code editor, create (if not already there) the robots.txt.liquid in the templates folder. Then, paste the following code:

{% for group in robots.default_groups %}
  {{- group.user_agent -}}

  {% for rule in group.rules %}
    {{- rule -}}
  {% endfor %}

  {%- if forloop.first -%}
    Disallow: /collections/*/*
  {% endif %}

  {%- if group.sitemap != blank -%}
    {{ group.sitemap }}
  {%- endif -%}
{% endfor %}

The rule inside the {% forloop.first %} condition is the one to prevent crawlers from accessing the tagged collections URLs. Once the modifications are done, you can use the following tool to check if crawlers can access or not !

Note 1: Be careful not to erase potential modifications already present in the file.

Note 2: Check that this does not contradict with URLs you want indexed, and update accordingly !

Fantastic answer. Thank you.

We will have to redirect many of these links beforehand since they stay alive and drive a lot of trafic… even if they have not been generated by our theme since updating almost 2 years ago!

1 Like