Skip to content

ReadFile/ReadConsoleA append U+FFFD to each emoji read in UTF-8 #19436

@ghost

Description

Windows Terminal version

1.23.250825001

Windows build number

10.0.26100.0

Other Software

No response

Steps to reproduce

Test code:

// /std:c++latest /utf-8

#include <exception>
#include <print>
#include <string>

#include <windows.h>

namespace my {
template <auto... Errors> auto check(auto result) {
  static_assert(sizeof...(Errors) != 0);
  if ((... && (result != Errors))) {
    return result;
  }
  std::terminate();
}
void assert_equal(auto x, auto y) {
  if (x != y) {
    std::terminate();
  }
}
} // namespace my

int main() {
  my::check<FALSE>(::SetConsoleCP(CP_UTF8));
  my::check<FALSE>(::SetConsoleOutputCP(CP_UTF8));
  const auto std_input = my::check<INVALID_HANDLE_VALUE, nullptr>(
      ::GetStdHandle(STD_INPUT_HANDLE));
  const auto std_output = my::check<INVALID_HANDLE_VALUE, nullptr>(
      ::GetStdHandle(STD_OUTPUT_HANDLE));
  char c = {};
  std::string s = {};
  ::DWORD number_of_bytes_read = {};
  ::DWORD number_of_bytes_written = {};
  while (true) {
    my::check<FALSE>(
        ::ReadFile(std_input, &c, 1, &number_of_bytes_read, nullptr));
    if (number_of_bytes_read == 0) {
      break;
    }
    std::print("{:02x}{}", c, c == '\n' ? '\n' : ' ');
    s += c;
    if (c == '\n') {
      const auto number_of_bytes_to_write = static_cast<::DWORD>(s.size());
      my::assert_equal(number_of_bytes_to_write, s.size());
      my::check<FALSE>(::WriteFile(std_output, s.data(),
                                   number_of_bytes_to_write,
                                   &number_of_bytes_written, nullptr));
      my::assert_equal(number_of_bytes_written, number_of_bytes_to_write);
      s = {};
    }
  }
}

Run the above code inside Windows Terminal (or in conhost.exe - the result is the same). Type in some emojis; a possible output is:

😀
f0 9f 98 80 ef bf bd 0d 0a
😀�
😀😀
f0 9f 98 80 ef bf bd f0 9f 98 80 ef bf bd 0d 0a
😀�😀�
^Z

Observe that each emoji read is followed by the ef bf bd sequence (the UTF-8 encoding of the replacement character).

Expected Behavior

These replacement characters should not appear in the read byte stream.

Actual Behavior

For some unknown reason they do appear. If there is a bug in the test code above, please let me know.

Metadata

Metadata

Assignees

Labels

Area-OutputRelated to output processing (inserting text into buffer, retrieving buffer text, etc.)Impact-CorrectnessIt be wrong.Issue-BugIt either shouldn't be doing this or needs an investigation.Priority-2A description (P2)Product-ConhostFor issues in the Console codebase

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions