Skip to content

Encoding problems #12

@d-01

Description

@d-01

I have encountered a problem with cyrillic text encoding.
From Windows Explorer:

files-list

From powershell console:

PS> ls |% name

cyrillic_7_chars=русский.txt
text-1251.txt
text-utf8.txt

PS> gc text-1251.txt

русский

PS> gc text-utf8.txt

С?С?С?С?РєРёР№

From Jupyter Notebook:

PS> ls |% name

cyrillic_7_chars=■■■■txt
text-1251.txt
text-utf8.txt

PS> gc text-1251.txt

■■■■

PS> gc text-utf8.txt

русский

I have found a workaround, but not sure how to apply this to fix the problem:

PS> [Text.Encoding]::Default.GetString([Text.Encoding]::UTF8.GetBytes((ls |% name) -join "`n"))

cyrillic_7_chars=русский.txt
text-1251.txt
text-utf8.txt

Environment information:

PS> [System.Text.Encoding]::Default

IsSingleByte      : True
BodyName          : koi8-r
EncodingName      : Cyrillic (Windows)
HeaderName        : windows-1251
...

PS> $psversiontable

Name                           Value                                           
----                           -----                                           
PSVersion                      5.1.14409.1005                                  
PSEdition                      Desktop                                         
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}                         
BuildVersion                   10.0.14409.1005                                 
CLRVersion                     4.0.30319.42000                                 
WSManStackVersion              3.0                                             
PSRemotingProtocolVersion      2.3                                             
SerializationVersion           1.1.0.1 

The version of the notebook server is: 5.6.0
The server is running on this version of Python: Python 3.7.0 (default, Jun 28 2018, 08:04:48) [MSC v.1912 64 bit (AMD64)]
Kernel info:

Name: powershell-kernel
Version: 0.0.8
Home-page: https://github.com/vors/jupyter-powershell
Author: Sergei Vorobev
Author-email: xvorsx@gmail.com

What else I've tried so far:

  1. Changing $OutputEncoding global variable
  2. Changing [console]::OutputEncoding
  3. Changing [console]::InputEncoding
  4. chcp 866 – doing nothing to cmd /cdir and Get-ChildItem / ls output
  5. chcp 65001 – fixes cmd /cdir but not Get-ChildItem / ls output
  6. Different browsers: Firefox, Chrome, IE11

Standard kernel (IPython 6.5.0) works fine:
In:

import os
os.listdir()

Out:

['cyrillic_7_chars=русский.txt', 'text-1251.txt', 'text-utf8.txt']

From powershell console:

PS> [text.encoding]::Default.getbytes('русский') | format-hex

00000000   F0 F3 F1 F1 EA E8 E9                             ðóññêèé

PS> [text.encoding]::utf8.getbytes('русский') | format-hex

00000000   D1 80 D1 83 D1 81 D1 81 D0 BA D0 B8 D0 B9        ����кий

From Jupyter Notebook:

PS> [text.encoding]::Default.getbytes('русский') | format-hex

00000000   D1 80 D1 83 D1 81 D1 81 D0 BA D0 B8 D0 B9        N?N?N?N???????  

PS> [text.encoding]::utf8.getbytes('русский') | format-hex

00000000   D0 A1 D0 82 D0 A1 D1 93 D0 A1 D0 83 D0 A1 D0 83  ??????N?????????
00000010   D0 A0 D1 94 D0 A0 D1 91 D0 A0 E2 84 96           ?■N??■N??■a??   

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions