-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
\x85 is, unfortunately, a hexadecimal escape sequence that refer to a code point shared by many international characters. It's all in the encoding but since I'm digging into some legacy code I could not avoid getting ISO-8859-1 strings from ending up being UTF-8-ised.
The example below illustrates that:
// UTF-8-ized Latin-1/ISO-8859-1 strings
var str0 = 'Nguy�n Thái Ng�c Duy',
str1 = 'Adam Pi�tyszek',
str2 = '��',
str3 = '�彦',
str4 = '����',
str5 = '���',
str6 = 'QQé�³ä¹� å�¨æ°�Kæ� QQ空é�´ QQ',
str7 = '�亨財���';
// decode UTF-8-ized Latin-1/ISO-8859-1 to UTF-8
var decode = function(str) {
var s;
try {
// if the string is UTF-8, this will work and not throw an error.
s = decodeURIComponent(escape(str));
} catch(e) {
// if it isn't, an error will be thrown, and we can asume that we have an ISO string.
s = str;
}
return s;
};
console.log('str0: ' + decode(str0)); // str0: Nguyễn Thái Ngọc Duy
console.log('str1: ' + decode(str1)); // str1: Adam Piątyszek
console.log('str2: ' + decode(str2)); // str2: 즈눅
console.log('str3: ' + decode(str3)); // str3: 元彦
console.log('str4: ' + decode(str4)); // str4: 入门教程
console.log('str5: ' + decode(str5)); // str5: 陈光远
console.log('str6: ' + decode(str6)); // str6: QQ音乐 全民K歌 QQ空间 QQ
console.log('str7: ' + decode(str7)); // str7: 鉅亨財經新聞
PS: It seems that the � (\x85) character is omitted while I'm entering text on Github's editor... so I don't know if the code above will run correctly.
This refers to the change introduced by this line. I'm sticking to v4.2.2 for now, great stuff! 👍
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels