lib: map Latin1 labels to iso-8859-1 instead of Windows-1252#58890
lib: map Latin1 labels to iso-8859-1 instead of Windows-1252#58890lytovka wants to merge 7 commits intonodejs:mainfrom
Conversation
Latin1 is incorrectly mapped to the Windows-1252 encoding, which defines mappings for bytes 0x80–0x9F, unlike Latin1 (ISO-8859-1), where these bytes are control characters. Fixing this discrepancy can cause unexpected behavior if TextDecoder is called with any latin1 label and attempts to decode bytes in the 0x80–0x9F range, since the decoding will now follow ISO-8859-1 encoding. Fixes: nodejs#56542
|
Review requested:
|
|
I do have some slight concerns over whether this may be considered a breaking change. My preference would be to handle it as a bug fix, however. I'd like some feedback from @nodejs/tsc |
I had similar thoughts. I've updated the PR description with a note explaining why this could be considered a breaking change:
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files
🚀 New features to boost your workflow:
|
|
I think we should land as a bug fix. |
|
Hi all! Just to confirm - aside from addressing the linting errors ( |
|
Hi @lytovka looks like the CI failure is related |
|
This issue/PR was marked as stalled, it will be automatically closed in 30 days. If it should remain open, please leave a comment explaining why it should remain open. |
|
This can be closed, #61093 landed, which removed the broken codepath. |
Fixes: #56542
This PR updates all Latin1 labels to point to the
iso-8859-1encoding instead ofWindows-1252. Theiso-8859-1encoding will now use thedecodeLatin1fast path when calling the decode method. TheWindows-1252encoding will not trigger thedecodeLatin1fast path; instead, it will follow the standard path for obtaining the converter from thesimdutflibrary.A new test file has been added to verify the decoded Unicode values of bytes
0x7F-0x9Fwhen theWindows-1252encoding is selected.NB: Fixing Latin1 label mappings will cause unexpected behavior if
TextDecoderis called with any Latin1 label and attempts to decode bytes in the0x80-0x9Frange, since decoding for any of these labels will now follow theiso-8859-1encoding.Refs: