Conversation
|
Yay pipelines! Just as with channels, I'll probably be asking a lot of dumb questions until I get it. So let's get started: WriterThis part looks immediately strange to me: final buffer = writer.getBuffer();
final count = writeData(buffer);
writer.advance(count);Why do I have to tell the writer how much I've written into it? Intuitively, this seems like something it should know itself, so it's not clear to me why I have to keep track of this Reader
Hmm, what if I consume everything and that happens to be enough, so I don't actually need more data? The semantics aren't quite clear to me here. Performance and non-exception pathsPerformance is important of course, but before we make things more complex we should measure if it could actually have an effect. Memory poolsThis always makes me feel like we're doing the job the GC is supposed to be doing, so I'm generally not in favor of it. |
|
Needing to specify how many bytes in If with With those two paragraphs I'm starting to understand maybe why C# uses |
|
I have asked Claude locally to write me a Haxe lexer using the pipelines API, and he was quite happy about that, calling the approach elegant and a good fit. His only complaints (other than
This seems agreeable (and I love that he calls it a "ceremony"). I guess that would be a good static extension convenience function.
This goes into the exception-free approach I suppose, and yes looking at the lexer it indeed is awkward having to try-catch a simple read-operation. I'm not sure what he means regarding that dead variable though because the implementation is just this right now: @:coroutine static function tryRefill(src:PipeReader):Null<ArrayBufferView> {
try {
return src.read();
} catch (_:hxcoro.ds.channels.exceptions.ChannelClosedException) {
return null;
}
} |
|
Yeah, some sort of I'm going to give the |
|
This stuff is really nice, I now have a Bytes -> Pipeline -> Channel -> Parser chain that even makes a lot of sense to me conceptually. The only thing missing is an asys file reader in the front to feed the pipeline from. Other than that this is great! ... although I should probably never benchmark this against a real parser or I'll be really sad. |
|
Aesthetic driven development is now the new trend. I've just pushed a I'm also going to give this a go with the hxcpp asys project, should be another nice use case for it. |
|
Claude approves of that change: What we improved:
|
|
In trying to use this further, "we" have found that there's a bug related to I'm also seeing that |
|
Suspending due to backpressure ( |
That's not really the case it seems! Real parser: My local hxcoro-pipeline-thing compiled to C++: 3x slower is not bad at all for a totally unoptimized implementation that just started working 5 minutes ago. JS and HL have very similar times, which is a good sign for our cross-target performance. On JVM it comes in at around 1000ms which is actually close to beating the real parser. Nice! |
Here's a first pass at most of pipelines (still some things to implement). It pretty closely follows the C# api so I'll explain the usage and where I think I'm going to diverge.
Writing
To write data into a pipe you call
getBufferwhich returns ahaxe.io.ArrayBufferViewof unspecified size which you can write into. There is an optional argument for the minimum size of the returned buffer. After you've written data into the buffer you calladvancespecifying the number of bytes you wrote. Neither of these functions are coroutines.You cannot reuse the buffer returned by
getBufferafter you calladvance, if you want to write more data you must get another buffer. E.g. if you had a number of packets in some protocol you cannot do this.You should instead do the following.
After a call to advance the data is not yet visible to the reader, you must call
flush, which is a coroutine, to make that data visible. This function will suspend if the back pressure writer threshold is reached and resume when the reader threshold is reached (not fully implemented).Reading
On the reader size you call
read(returns an unspecified amount of data) orreadAtLeast(not yet implemented) which both give you ahaxe.io.ArrayBufferViewof data. Like the writer you then calladvanceto specify both how much data you have consumed (can be dropped) and observed (kept, but represents incomplete data). Both of these values are in bytes, for consumed it's from the beginning of the buffer and for observed it's bytes starting from the end of the consumed region.There are some important differences with how
readresponds depending on what you provide consumed and observed with which I will document below.Potential Changes
Reader WaitForRead
In the C# version
Readreturns a struct which contains the buffer as well as a field saying if this read was cancelled or the writer has been closed. Performance is such a concern for them that they have non-exception paths for all this. Assuming we're not that focused on it I wonder if re-using thewaitForReadandtryReadstyle from the channel reader would make sense. E.g. the reader functions become.A typical reader loop with that might look something like this.
Memory Pools
Also related to C# performance features, when creating a pipe you can specify a custom
MemoryPool<T>the writer will use to manage it's internal buffers. I have not checked the actual implementation, but I assume all buffer allocations for expanding, compacting, etc, will go through this object as opposed to direct allocations.Maybe we do want something like this?
Offsets vs Absolute Positions
For readers
advanceconsumed and observed are byte offsets, but in C# they are positions. This doesn't make a difference for consumed but it does for observed. I bring this up because of the following point.Complimantary Buffer Reading Library
Along with the
System.IO.Pipelinespackage a complimantarySystem.Bufferswas released, designed around easily reading and writing to pipes. Something similar is probably out of scope for hxcoro but I want to make sure it would be easy enough to do something similar.Take a look at the following read world pipeline reading code for C#
The interesting thing here is
ConsumeBufferwhich is continually called until it returns false and is passed a reference to the buffer (a stack only type,ReadOnlySequence<byte>). Also thatAdvanceTouses the start and end properties of that buffer.If we take a look at the
ConsumeBufferfunction theSequenceReadertype provides the magic which makes it all work.If the
TryReadcalls succeed the sequence reader keeps track of how much of the underlying sequence it has read and how much has been unread. The core part here is where we then re-assign the sequence reference toreader.UnreadSequenceonce we can't read any more packets out of the reader. This then causes thatAdvanceTocall to consume the range we have read packets out of and mark the remaining unread section as observed.Non Exception Paths
As briefly mentioned, C# allows you to cancel pipeline operations (reads, flushes) with out exceptions. I have no experience with these parts of the API so can't really speak to them, I've only used pipelines as a much easier alternative to the traditional socket reading and writing with very low data rate serial ports.
Implementation
I've implemented the pipe on top of a channel where "pages" of data are passed through. This is probably not the most optimal and I'm sure there all sorts of fancy buffer sharing that could be done.
Sorry for the wall of text!