Short description of issue
Liquid ‘strip’ filter only removes ASCII whitespace, ignores Unicode whitespace (U+00A0, U+202F, U+2009, etc.)
Reproduction steps
{%- assign s = 'Couleur ’ -%} {# trailing char is U+202F #}
{%- assign stripped = s | strip -%}
{{ stripped | size }} {# expected: 7, actual: 8 #}
{{ stripped == ‘Couleur’ }} {# expected: true, actual: false #}
Same result for U+00A0:
{%- assign s = 'Color ’ -%} {# trailing char is U+00A0 #}
{{ s | strip | size }} {# expected: 5, actual: 6 #}
Additional info
Liquid’s strip filter (and by extension lstrip / rstrip) only removes ASCII
whitespace characters (space U+0020, tab U+0009, CR U+000D, LF U+000A). It does not
remove other Unicode characters that have the White_Space property in the Unicode
standard, including:
- U+00A0 NO-BREAK SPACE
- U+202F NARROW NO-BREAK SPACE
- U+2009 THIN SPACE
- U+2007 FIGURE SPACE
- U+200B ZERO WIDTH SPACE (not technically White_Space, but commonly stripped by
other languages’ equivalents)
This is increasingly problematic because Shopify itself can produce these
characters in storefront strings — e.g. the auto-translated French value of
product.options[*].name ends with U+202F (narrow no-break space) due to French
typographic rules requiring a thin space before :. App and theme developers who
normalize with | downcase | strip to compare against a known list end up with
silent comparison failures that are very hard to diagnose (the character renders
identically to a regular s
What type of topic is this
Bug report