-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Description
I have encountered a problem with cyrillic text encoding.
From Windows Explorer:
From powershell console:
PS> ls |% name
cyrillic_7_chars=русский.txt
text-1251.txt
text-utf8.txt
PS> gc text-1251.txt
русский
PS> gc text-utf8.txt
С?С?С?С?РєРёР№From Jupyter Notebook:
PS> ls |% name
cyrillic_7_chars=■■■■txt
text-1251.txt
text-utf8.txt
PS> gc text-1251.txt
■■■■
PS> gc text-utf8.txt
русскийI have found a workaround, but not sure how to apply this to fix the problem:
PS> [Text.Encoding]::Default.GetString([Text.Encoding]::UTF8.GetBytes((ls |% name) -join "`n"))
cyrillic_7_chars=русский.txt
text-1251.txt
text-utf8.txtEnvironment information:
PS> [System.Text.Encoding]::Default
IsSingleByte : True
BodyName : koi8-r
EncodingName : Cyrillic (Windows)
HeaderName : windows-1251
...
PS> $psversiontable
Name Value
---- -----
PSVersion 5.1.14409.1005
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.14409.1005
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1 The version of the notebook server is: 5.6.0
The server is running on this version of Python: Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]
Kernel info:
Name: powershell-kernel
Version: 0.0.8
Home-page: https://github.com/vors/jupyter-powershell
Author: Sergei Vorobev
Author-email: xvorsx@gmail.com
What else I've tried so far:
- Changing
$OutputEncodingglobal variable - Changing
[console]::OutputEncoding - Changing
[console]::InputEncoding chcp 866– doing nothing tocmd /cdirandGet-ChildItem/lsoutputchcp 65001– fixescmd /cdirbut notGet-ChildItem/lsoutput- Different browsers: Firefox, Chrome, IE11
Standard kernel (IPython 6.5.0) works fine:
In:
import os
os.listdir()Out:
['cyrillic_7_chars=русский.txt', 'text-1251.txt', 'text-utf8.txt']
From powershell console:
PS> [text.encoding]::Default.getbytes('русский') | format-hex
00000000 F0 F3 F1 F1 EA E8 E9 ðóññêèé
PS> [text.encoding]::utf8.getbytes('русский') | format-hex
00000000 D1 80 D1 83 D1 81 D1 81 D0 BA D0 B8 D0 B9 ����кийFrom Jupyter Notebook:
PS> [text.encoding]::Default.getbytes('русский') | format-hex
00000000 D1 80 D1 83 D1 81 D1 81 D0 BA D0 B8 D0 B9 N?N?N?N???????
PS> [text.encoding]::utf8.getbytes('русский') | format-hex
00000000 D0 A1 D0 82 D0 A1 D1 93 D0 A1 D0 83 D0 A1 D0 83 ??????N?????????
00000010 D0 A0 D1 94 D0 A0 D1 91 D0 A0 E2 84 96 ?■N??■N??■a?? Metadata
Metadata
Assignees
Labels
No labels
