Skip to content
Draft

Draft #6547

Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
270 changes: 270 additions & 0 deletions docs/pipe-protocol.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# \[DRAFT] Pipe protocol specification for 'dotnet test' of Microsoft.Testing.Platform

> [!IMPORTANT]
> This document is intended to be used only for internal purposes only. The protocol is not for public usages and we reserve any right to adjust or break as needed.

This document outlines the protocol used by 'dotnet test' CLI when communicating with Microsoft.Testing.Platform (MTP) applications.

> [!NOTE]
> Through the document, .NET CLI will be referred to for easy interpretation, but it's not necessarily ".NET CLI".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this sentence mean? what's the difference between .NET CLI and ".NET CLI"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we are using .NET CLI as an example. But it could be whatever other client (or well, from pipe-point of view, server)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the JSON-RPC protocol has the following start

MSTest Runner projects builds into a self-contained executable that can be invoked to run tests. The protocol describes the communication between the client (IDE/CLI/CI) any the MSTest Runner executable (also refered to as the server).
The communication is based on JSON-RPC and describes the RPC messages sent in order to support running of tests.

if we'd use a similar preamble reading these docs side by side would be easier.

otherwise, a similar note could be added. that the document describes the communication between any client (dotnet test) and MTP runner. and that afterwards this document uses .NET CLI when referring to any compatible client


## General flow

- User invokes 'dotnet test'.
- For every test application, .NET CLI creates a unique pipe server.
- The test application is run with command-line argument specifying the pipe name.
- The test application connects to the given pipe.
- Communication starts.
- Note that child processes may also connect to the same pipe. So it's possible to have multiple connections on the same pipe.

## Common terminology

### `ExecutionId`

For the execution of a test app, the test app and all its child processes are uniquely identified by `ExecutionId`. Per **current implementation**, we can already identify the test app and all its child processes by knowing which pipe is receiving the message. But we are still including `ExecutionId` in the protocol, in case we needed it in future due to implementation changes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the ExecutionId is a generated GUID by the test runner? are currently messages validated that the ExecutionId does not change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's a Guid generated by MTP and is the same for child processes as well.
Today's implementation might be missing the validation. The purpose of this doc is to have a well-defined spec, then review the implementation and adjust it as needed.


### `InstanceId`

This identifies a "retrying test host". When .NET CLI starts to receive messages a different `InstanceId`, it knows that this is test host that is doing "retry".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of retrying testhost, it might be good to explain which process is the one that connects to the pipe. I'm assuming that the outer process that runs the retry logic does not, it's only the inner process which does

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All processes connect to the pipe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to explain that in more detail in case of retry. what messages are expected to be sent by the outer process node and what messages are expected to be sent by the inner process nodes


## Modes of operation

The test application can operate in one of three modes:

- Help mode (when `--help` is used)
- Discovery mode (when `--list-tests` is used)
- Execution mode (when it actually runs tests, both `--help` and `--list-tests` are not specified)
- TODO(Youssef): Revise MTP behavior if both `--help` and `--list-tests` are specified.

## Handshake

Handshake is the first message expected to be received. During handshake, we negotiate the protocol version to be used. Note that the handshake message is the one that is expected to always be the same for all protocol versions. Future protocol versions MUST not introduce a breaking change to the handshake message. It's not anticipated that we will need to introduce a new versions of protocol though. So far, we only have version "1.0.0". New features are possible to be added to "1.0.0" without breaking changes. But we can't be sure what the future will bring.

Handshake is expected to happen in all modes. Help, discovery, and execution.
Today, there is an MTP bug that we don't handshake in help mode. This will be worked around in .NET CLI. A fix needs to be done in MTP.

### Handshake request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to align it with the MTP VS protocol spec and write down the direction of the request for brevity. for instance it looks like the request is first sent from the MTP server to the dotnet test CLI. maybe it's written somewhere above and I missed it


The handshake request contains multiple properties. Some of which are required by the .NET CLI, and some of them are not required.

- Process ID (optional): The process id sending the handshake.
- Architecture (required): The architecture of the running test application.
- Framework (required): The framework description, as given by `RuntimeInformation.FrameworkDescription`.
- OS (optional): The operating system running the test application.
- Supported protocol versions (required): The protocol versions that are supported by the test application, separated by semicolons.
- Host type (required): The host type which is handshaking with us.
- Module path (required): The path to the test module, either the assembly dll path, or the actual apphost (exe) path.
- ExecutionId (required): Explained above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a markdown document it might be better to add hyperlinks here, so that I can click and navigate to the definition quickly

- InstanceId (required): Explained above.

### Handshake response

The handshake response contains multiple properties. Some of which are required by the test application, and some of them are not required.

- Process ID (optional): The process id of the .NET CLI.
- Architecture (optional): The architecture of the .NET CLI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this is a binary format, is there an assumption currently that dotnet test and the test process much run on the same machine? or are you handling the different endianness somehow in the protocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today we expect them to run on the same machine. If we wanted to change that in the future, then endianness will need extra special handling.

- TODO(youssef): This shouldn't be needed at all.
- Framework (optional): The framework description of the .NET CLI, as given by `RuntimeInformation.FrameworkDescription`
- TODO(youssef): This shouldn't be needed at all.
- OS (optional): The operating system running the .NET CLI.
- Supported protocol version: The final protocol version to use. This can be empty/omitted if the .NET CLI doesn't support any of the versions sent during the handshake request.
- TODO(youssef): How should the version checks happen? Should we consider only the "major" part for deciding compatibility? In what scenarios could we need to only bump minor/patch components? What would it signify and how that info is supposed to be used? Today, we do a full exact version check.

## Test session event message

TODO(Youssef): Verify if this should be sent during help or discovery modes.

This message denotes test events. Currently, there are two events, start and end. This message has the following properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of session messages? do they somehow control when the dotnet test should stop waiting for discovery/result messages?

also, in session a higher concept than InstanceId? can you get multiple InstanceIds for a single session?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not expected to get multiple instance ids for a single session.
I'm personally not a fan of the InstanceId, and I'm in favor of slowing removing/deprecating it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for test session end event, it's very necessary.
Imagine a test calling Environment.Exit(0). Without the notion of "test session end", we might see a bunch of passing test results, then process exit with exit code 0, and consider this scenario as "passing".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to write down what the session start/end is used for


- Event type (required): start or end.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume here we do not need to define the types of these, i.e. is this an Enum or is this a Boolean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a byte (consider it enum, doesn't matter much)

- SessionUid (required)
- ExecutionId (required)

## Command line options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be good to add "message" for consistency


This message is expected to be received only in help mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be useful to explain what this message means from user's point of view, i.e. that if a user calls dotnet test --help each test application is expected to report its command line options back


The test app MUST NOT send this message in discovery or execution modes.

The message has the following properties:

- Module path
- TODO(youssef): Is it optional or required?
- List of command-line options, each command-line option has the following properties
- Name (required)
- Description (optional)
- IsHidden (optional): .NET CLI can assume "false" if it's not present.
- IsBuiltIn: TODO(youssef) optional or required? If not present, how should we treat it?

Generally, a handshake should be required before receiving this message, but due to MTP bug that causes MTP to not send handshake in help mode, the .NET SDK implementation will tolerate this.

## Discovery message

Indicates that the test app discovered a test or multiple tests.

- When running in "discovery" mode, the test app MIGHT send these messages.
- The absence of these messages in discovery mode means that the test app doesn't contain any tests.
- When running in "execution" mode, the test app MIGHT send these messages.
- When running in "help" mode, the test app MUST NOT send these messages.

The message has the following properties:

- ExecutionId (required)
- InstanceId
- TODO(youssef) does it make sense to actually send InstanceId for discovery? How is this ever intended to be used?
- List of individual discovered tests. Each discovered test has the following properties:
- Uid (required)
- DisplayName (required)

## Test result message

Indicates that the test app completed running a test or multiple tests.

- When running in "discovery" mode, the test app MUST NOT send these messages.
- When running in "execution" mode, the absence of these messages mean that the test app doesn't have any tests.
- When running in "help" mode, the test app MUST NOT send these messages.

This message contains the following properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are in-progress tests somehow being relayed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A while back, I added an "IDE mode" to the protocol (and I'm realizing now this part isn't documented and probably needs to be documented). With the "IDE" mode, in-progress is reported as "successful" with state denoting "in-progress". In IDE mode, we also send line number and file path. Maybe few more things that I don't recall now.


- ExecutionId (required)
- InstanceId (required)
- List of successful test messages. Each successful test message has the following properties:
- Uid (required)
- DisplayName (required)
- State (required)
- Duration (optional)
- Reason (optional)
- Standard output (optional)
- Standard error (optional)
- SessionUid (optional or required?)
- List of failed test messages. Each failed test message has the following properties:
- Uid (required)
- DisplayName (required)
- State (required)
- Duration (optional)
- Reason (optional)
- List of exceptions (optional). Each exception has:
- message (optional or required?)
- type (optional or required?)
- stack trace (optional or required?)
- Standard output (optional)
- Standard error (optional)
- SessionUid (optional or required?)

TODO(youssef): SessionUid shouldn't be different across test results. Why are we adding it as part of the individual test result?

## File artifact message

- When running in "discovery" mode, the test app MUST NOT send these messages.
- When running in "execution" mode, the test app MIGHT send these messages.
- When running in "help" mode, the test app MUST NOT send these messages.

- ExecutionId (required)
- InstanceId (required)
- List of file artifact messages. Each has the following properties:
- Full path (?)
- DisplayName (?)
- Description (?)
- TestUid (?)
- TestDisplayName (?)
- SessionUid (?)

TODO(youssef): SessionUid shouldn't be different across different artifacts. Why are we adding it as part of the individual test artifact?

## Implementation guidelines

- Failing to receive any "required" property in the protocol is an error that should be clearly surfaced to the user.
- All messages coming from a single test application should have the same ExecutionId. Receiving a different ExecutionId for the same test app is an error.
- Receiving a non-handshake message with InstanceId that wasn't seen in a previous handshake is a protocol violation.
- Receiving any message without a previous handshake is an error.
- Exception: command-line options messages in help mode. It's only an exception as a "workaround" due to MTP bug, but from protocol point of view, it's an error.
- No messages should ever be attempted to be decoded or received before the handshake message.
- If we don't know the protocol version yet, we cannot deserialize anything other than handshake!
- Implementations must NOT deserialize messages and *then* complaining that handshake wasn't received. The deserialization shouldn't be attempted in the first place!
- So, when receiving a message, we read first 4 bytes as message size, then next 4 bytes denoting serializer id. If serializer id doesn't correspond to handshake serializer id, **don't** continue deserialization and fail immediately.
- All handshake requests from a given test app should have the same ExecutionId, module path, and supported protocols versions.
- TODO(youssef): Should we have the same architecture, framework description, and OS as well?
- TODO(youssef): Can we receive multiple handshakes with HostType=TestHostController?
- It's allowed to get multiple handshakes with HostType=TestHost, either with the same InstanceId (indicating sharding), or different InstanceId (indicating retry)
- If .NET CLI is not in "help" mode, then it's not expected to receive command line options message.
- MTP has a bug today where no handshake is done when in help mode. .NET CLI implementation will account for this bug by not requiring a handshake in this case. However, when this bug is fixed, we will have special HostType (e.g, "HelpHost").
- If handshake with HelpHost is received, then the only expected message is command line options message.
- If handshake with different HostType is received, then it's not expected to receive command line options messages.
- Discovery messages are only intended to be received by HostType=TestHost. It's a violation if it's received from a different host type.
- Test result messages are only intended to be received by HostType=TestHost. It's a violation if it's received from a different host type.
- File artifact messages can be received by either TestHost or TestHostController.
- Discovery messages, test result messages, and file artifact messages can only be received after a test session event with event type start.
- Discovery messages, test result messages, and file artifact messages cannot be received after a test session event with event type finish.
- TODO(youssef): Does that mean test host controllers also send test session events?
- Test session start/finish must not be received multiple times **per session uid**
- TODO(youssef): Any hot reload concerns by blocking this scenario? If we don't block this scenario, what should we do?
- Receiving a test session finish without a corresponding start is an error.
- It's an error if the test app exited without receiving test session finish events corresponding to all received test session start events.
- TODO(youssef): Should test session event messages be received only by HostType=TestHost or HostType=TestHostController?
- Failing to receive test session start or finish message at all when HostType=TestHost or HostType=TestHostController is an error.
- Open question: For "unknown" messages? What is the best to do to preserve the max possible compatibility?
- Today: when .NET CLI receives an "unknown serializer id", it skips reading this message, and it responds with VoidResponse.
- If the expected response of this message on MTP side is not VoidResponse, then MTP will fail with InvalidCastException.
- We might have two different scenarios here:
- A future MTP request **really** requires a response, and MTP cannot move forward without a response. In this case, we cannot do anything. It's a breaking change.
- A future MTP request might not really require a response. In this case, VoidResponse can be a meaning of ".NET CLI didn't recognize the request".
- However, this brings ambiguity. If a future MTP message isn't recognized by .NET CLI, and MTP is already expecting VoidResponse.
- There is no way to tell if .NET CLI understood our message or not.
- Could it be important for MTP to know whether or not .NET CLI understood the request?
- In this case, should we have a "special" response that means "I didn't understand your request"?
- I think such special response gives us the most flexibility.

## General protocol message format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have under the General flow some reference that the messages are sent using a BinaryFormatter usign the payload defined in this section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how BinaryFormatter is relevant here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without seeing the reference implementation I'm unsure how the messages are encoded in practice. knowing the Stream/ StreamWriter/ Serializer APIs used to encode the messages would help me understand this better


- 4 bytes: message size
- 4 bytes: serializer id (indicator of the message type)
- x bytes: message payload
- property/field count (2 bytes)
- for each property/field:
- property/field id (2 bytes)
- property/field size (4 bytes)
- property/field value (n bytes)
- If the value is an "array":
- array length (4 bytes)
- each array element follows again the message payload format (id, size, and value).

```mermaid
graph TD
A[Message] --> B["Message Size (4 bytes)"]
A --> C["Serializer ID (4 bytes)"]
A --> D["Message Payload (x bytes)"]

D --> D1["Property/Field Count (2 bytes)"]
D --> D2[Property/Field]

D2 --> D2a["Property/Field ID (2 bytes)"]
D2 --> D2b["Property/Field Size (4 bytes)"]
D2 --> D2c["Property/Field Value (n bytes)"]

D2c --> E{Is Array?}
E -->|Yes| F["Array Length (4 bytes)"]
F --> G[Array Element]
G --> G1["Element ID (2 bytes)"]
G --> G2["Element Size (4 bytes)"]
G --> G3["Element Value (n bytes)"]
```

## Future considerations

- Youssef: Can we get rid of InstanceId completely? I personally don't believe it was the right design.
- If we want to simply track retries, that could simply have been a metadata on test results.
- Having it as a metadata on test results even allows for in-process retries to be reported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tricky part here is to align the tests with file artifacts.
For instance, if the tests get rerun 3 times, how should these be reported.
And when should a FileArtifact get a different SessionId.

So if a test runner runs test1 twice and collects the coverage once, perhaps there should be a single SessionId, but two different RunIds.
Whereas, if a test runner runs test1 once, collects the coverage and runs it again, perhaps there should be just one RunId, but two different SessionIds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, for the purpose of dotnet test, it's sufficient to link artifacts to test results via TestNodeUid+SessionUid. But, this might limit us if, in future, we wanted to use the pipe protocol in VS. There is at least an ambiguity when an MSTest test method is folded and it reports multiple results. In this case, we can't link artifacts to individual results.

At the same time, TestNodeUid is supposedly to uniquely identify a "single" test. So, what MSTest does today with folding is already kinda a violation of MTP contract.

- Unfortunately, removing InstanceId is going to be a breaking change in the protocol.
- What we could do?
- Keep InstanceId for compatibility.
- Add the metadata on test results as "another" way of knowing retries.
- In handshake request, tell .NET CLI that we have the "new" way of knowing retries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we considering using the VS server protocol for dotnet test at any point? we already have capabilities there and this would make it easier to include features such as retries to both CLI and VS/VsCode simulatenously

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally would consider the opposite 😄
Any kind of "capability" can potentially be added as a property in handshake request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the JSON-RPC provides a more general-purpose way of communicating between client and server and comes with out of the box implementations for many programming languages, decreasing the amount of work needed to create various implementations

and there's a couple of features that the JSON-RPC brings out of the box

  • allows server to respond with exceptions to the client (rather than just rely on console stderr)
  • allows client to cancel requests (rather than support cancellation just via Ctrl+C)
  • allows to overlap messages (where now requests cannot be processed in parallel)
  • allows to write a client that logs all RPC messages for debugging purposes (as by default everything is just JSON). if one of the endpoints doesn't know how to decode the binary, debugging through the message is much trickier
  • supports both request/response and notifications data formats
  • if we use well known JSON-RPC libraries we already can get the endianness handling for free
  • all of the serialization logic is already well known/documented since it's just JSON

of course these can be over added to the pipe protocol, but we're effectively reimplementing JSON-RPC with a custom serialization format

- In handshake request, .NET CLI tells us that is knows the "new" way of knowing retries
- This step is not needed if we are able to make this change before .NET 10 GA.
- If both sides can understand the "new" way of knowing retries, InstanceId could then be skipped completely by both sides.
- If the test app or .NET CLI can't understand the "new" way of knowing retries, InstanceId is kept.
- Once most users are using a versions that can both understand the "new" way of knowing retries, we can then remove InstanceId from the protocol completely.
- Note that the platform will now need to provide a way for extensions/frameworks to indicate retries. Maybe just a special cased TestMetadataProperty?
- The .NET CLI experience will need to be adjusted.
- CONSIDER: Do we still need some sort of informing retries during handshake?
- Maybe part of orchestrator handshake?