Skip to content

Commit 19e3585

Browse files
authored
Merge pull request #641 from input-output-hk/ch1bo/demo-scaffold
Scaffold a demo/ directory
2 parents 169cda2 + 0cc1f1d commit 19e3585

File tree

7 files changed

+326
-1
lines changed

7 files changed

+326
-1
lines changed

demo/.envrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
use nix

demo/2025-10/README.md

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
# Prototype demo - October 2025
2+
3+
Minimum viable demo of Leios network traffic interfering with Praos using a three node setup and prepared Praos and Leios data.
4+
5+
> ![WARNING]
6+
> TODO: Add overview / architecture diagram.
7+
8+
See https://github.com/IntersectMBO/ouroboros-consensus/issues/1701 for more context.
9+
10+
## Prepare the shell environment
11+
12+
- If your environment can successfully execute `cabal build exe:cardano-node` from this commit, then it can build this demo's exes.
13+
14+
```
15+
$ git log -1 10.5.1
16+
commit ca1ec278070baf4481564a6ba7b4a5b9e3d9f366 (tag: 10.5.1, origin/release/10.5.1, nfrisby/leiosdemo2025-anchor)
17+
Author: Jordan Millar <jordan.millar@iohk.io>
18+
Date: Wed Jul 2 08:24:11 2025 -0400
19+
20+
Bump node version to 10.5.1
21+
```
22+
23+
- The Python script needs `pandas` and `matplotlib`.
24+
- The various commands and bash scripts below needs `toxiproxy`, `sqlite`, `ps` (which on a `nix-shell` might require the `procps` package for matching CLIB, eg), and so on.
25+
- Set `CONSENSUS_BUILD_DIR` to the absolute path of a directory in which `cabal build exe:immdb-server` will succeed.
26+
- Set `NODE_BUILD_DIR` to the absolute path of a directory in which `cabal build exe:cardano-node` will succeed.
27+
- Set `CONSENSUS_REPO_DIR` to the absolute path of the `ouroboros-consensus` repo.
28+
29+
- Checkout a patched version of the `cardano-node` repository, something like the following, eg.
30+
31+
```
32+
6119c5cff0 - (HEAD -> nfrisby/leiosdemo2025, origin/nfrisby/leiosdemo2025) WIP add Leios demo Consensus s-r-p (25 hours ago) <Nicolas Frisby>
33+
```
34+
35+
- If you're using a `source-repository-package` stanza for the `cabal build exe:cardano-node` command in the `NODE_BUILD_DIR`, confirm that it identifies the `ouroboros-consensus` commit you want to use (eg the one you're reading this file in).
36+
37+
## Build the exes
38+
39+
```
40+
$ (cd $CONSENSUS_BUILD_DIR; cabal build exe:immdb-server exe:leiosdemo202510)
41+
$ IMMDB_SERVER="$(cd $CONSENSUS_BUILD_DIR; cabal list-bin exe:immdb-server)"
42+
$ DEMO_TOOL="$(cd $CONSENSUS_BUILD_DIR; cabal list-bin exe:leiosdemo202510)"
43+
$ (cd $CONSENSUS_BUILD_DIR; cabal build exe:cardano-node)
44+
$ CARDANO_NODE="$(cd $CONSENSUS_BUILD_DIR; cabal list-bin exe:cardano-node)"
45+
```
46+
47+
## Prepare the input data files
48+
49+
```
50+
$ (cd $CONSENSUS_BUILD_DIR; $DEMO_TOOL generate demoUpstream.db "${CONSENSUS_REPO_DIR}/demoManifest.json" demoBaseSchedule.json)
51+
$ cp demoBaseSchedule.json demoSchedule.json
52+
$ # You must now edit demoSchedule.json so that the first number in each array is 182.9
53+
$ echo '[]' >emptySchedule.json
54+
$ # create the following symlinks
55+
$ (cd $CONSENSUS_REPO_DIR; ls -l $(find nix/ -name genesis-*.json))
56+
lrwxrwxrwx 1 nfrisby nifr 30 Oct 24 16:27 nix/leios-mvd/immdb-node/genesis-alonzo.json -> ../genesis/genesis.alonzo.json
57+
lrwxrwxrwx 1 nfrisby nifr 29 Oct 24 16:27 nix/leios-mvd/immdb-node/genesis-byron.json -> ../genesis/genesis.byron.json
58+
lrwxrwxrwx 1 nfrisby nifr 30 Oct 24 16:27 nix/leios-mvd/immdb-node/genesis-conway.json -> ../genesis/genesis.conway.json
59+
lrwxrwxrwx 1 nfrisby nifr 31 Oct 24 16:27 nix/leios-mvd/immdb-node/genesis-shelley.json -> ../genesis/genesis.shelley.json
60+
lrwxrwxrwx 1 nfrisby nifr 30 Oct 24 16:27 nix/leios-mvd/leios-node/genesis-alonzo.json -> ../genesis/genesis.alonzo.json
61+
lrwxrwxrwx 1 nfrisby nifr 29 Oct 24 16:27 nix/leios-mvd/leios-node/genesis-byron.json -> ../genesis/genesis.byron.json
62+
lrwxrwxrwx 1 nfrisby nifr 30 Oct 24 16:27 nix/leios-mvd/leios-node/genesis-conway.json -> ../genesis/genesis.conway.json
63+
lrwxrwxrwx 1 nfrisby nifr 31 Oct 24 16:27 nix/leios-mvd/leios-node/genesis-shelley.json -> ../genesis/genesis.shelley.json
64+
```
65+
66+
## Prepare to run scenarios
67+
68+
Ensure a toxiproxy server is running.
69+
70+
```
71+
$ toxiproxy-server 1>toxiproxy.log 2>&1 &
72+
```
73+
74+
## Run the scenario
75+
76+
Run the scenario with `emptySchedule.json`, ie no Leios traffic.
77+
78+
```
79+
$ LEIOS_UPSTREAM_DB_PATH="$(pwd)/demoUpstream.db" LEIOS_SCHEDULE="$(pwd)/emptySchedule.json" SECONDS_UNTIL_REF_SLOT=5 REF_SLOT=177 CLUSTER_RUN_DATA="${CONSENSUS_REPO_DIR}/nix/leios-mvd" CARDANO_NODE=$CARDANO_NODE IMMDB_SERVER=$IMMDB_SERVER ${CONSENSUS_REPO_DIR}/scripts/leios-demo/leios-october-demo.sh
80+
$ # wait about ~20 seconds before stopping the execution by pressing any key
81+
```
82+
83+
Run the scenario with `demoSchedule.json`.
84+
85+
```
86+
$ LEIOS_UPSTREAM_DB_PATH="$(pwd)/demoUpstream.db" LEIOS_SCHEDULE="$(pwd)/demoSchedule.json" SECONDS_UNTIL_REF_SLOT=5 REF_SLOT=177 CLUSTER_RUN_DATA="${CONSENSUS_REPO_DIR}/nix/leios-mvd" CARDANO_NODE=$CARDANO_NODE IMMDB_SERVER=$IMMDB_SERVER ${CONSENSUS_REPO_DIR}/scripts/leios-demo/leios-october-demo.sh
87+
$ # wait about ~20 seconds before stopping the execution by pressing any key
88+
```
89+
90+
## Analysis
91+
92+
Compare and contrast the `latency_ms` column for the rows with a slot that's after the reference slot 177.
93+
The first few such ros (ie those within a couple seconds of the reference slot) seem to often also be disrupted, because the initial bulk syncing to catch up to the reference slot presumably leaves the node in a disrupted state for a short interval.
94+
95+
**WARNING**.
96+
Each execution consumes about 0.5 gigabytes of disk.
97+
The script announces where (eg `Temporary data stored at: /run/user/1000/leios-october-demo.c5Wmxc`), so you can delete each run's data when necessary.
98+
99+
**INFO**.
100+
If you don't see any data in the 'Extracted and Merged Data Summary' table, then check the log files in the run's temporary directory.
101+
This is where you might see messages about, eg, the missing `genesis-*.json` files, bad syntax in the `demoSchedule.json` file, etc.
102+
103+
# Details about the demo components
104+
105+
## The topology
106+
107+
For this first iteration, the demo topology is a simple linear graph.
108+
109+
```mermaid
110+
flowchart TD
111+
MockedUpstreamPeer --> Node0 --> MockedDownstreamPeer
112+
```
113+
114+
**INFO**.
115+
In this iteration of the demo, the mocked downstream peer (see section below) is simply another node, ie Node1.
116+
117+
## The Praos traffic and Leios traffic
118+
119+
In this iteration of the demo, the data and traffic is very simple.
120+
121+
- The Praos data is a simple chain provided by the Performance&Tracing team.
122+
- The mocked upstream peer serves each Praos block when the mocked wall-clock reaches the onset of their slots.
123+
- The Leios data is ten 12.5 megabyte EBs.
124+
They use the minimal number of txs necessary in order to accumulate 12.5 megabytes in order to minimize the CPU&heap overhead of the patched-in Leios logic, since this iteration of trhe demo is primarily intended to focus on networking.
125+
- The mocked upstream peer serves those EBs just prior to the onset of one of the Praos block's slot, akin to (relatively minor) ATK-LeiosProtocolBurst attack.
126+
Thus, the patched nodes are under significant Leios load when that Praos block begins diffusing.
127+
128+
## The demo tool
129+
130+
The `cabal run exe:leiosdemo202510 -- generate ...` command generates a SQLite database with the following schema.
131+
132+
```
133+
CREATE TABLE ebPoints (
134+
ebSlot INTEGER NOT NULL
135+
,
136+
ebHashBytes BLOB NOT NULL
137+
,
138+
ebId INTEGER NOT NULL
139+
,
140+
PRIMARY KEY (ebSlot, ebHashBytes)
141+
) WITHOUT ROWID;
142+
CREATE TABLE ebTxs (
143+
ebId INTEGER NOT NULL -- foreign key ebPoints.ebId
144+
,
145+
txOffset INTEGER NOT NULL
146+
,
147+
txHashBytes BLOB NOT NULL -- raw bytes
148+
,
149+
txBytesSize INTEGER NOT NULL
150+
,
151+
txBytes BLOB -- valid CBOR
152+
,
153+
PRIMARY KEY (ebId, txOffset)
154+
) WITHOUT ROWID;
155+
```
156+
157+
The contents of the generated database are determine by the given `manifest.json` file.
158+
For now, see the `demoManifest.json` file for the primary schema: each "`txRecipe`" is simply the byte size of the transaction.
159+
160+
The `generate` subcommand also generates a default `schedule.json`.
161+
Each EB will have two array elements in the schedule.
162+
The first number in an array element is a fractional slot, which determines when the mocked upstream peer will offer the payload.
163+
The rest of the array element is `MsgLeiosBlockOffer` if the EB's byte size is listed or `MsgLeiosBlockTxsOffer` if `null` is listed.
164+
165+
The secondary schema of the manifest allows for EBs to overlap (which isn't necessary for this demo, despite the pathced node fully supporting it).
166+
Overlap is created by an alternative "`txRecipe`", an object `{"share": "XYZ", "startIncl": 90, "stopExcl": 105}` where `"nickname": "XYZ"` was included in a preceding _source_ EB recipe.
167+
The `"startIncl`" and `"stopExcl"` are inclusive and exclusive indices into the source EB (aka a left-closed right-open interval); `"stopExcl"` is optional and defaults to the length of the source EB.
168+
With this `"share"` syntax, it is possible for an EB to include the same tx multiple times.
169+
That would not be a well-formed EB, but the prototype's behavior in response to such an EB is undefined---it's fine for the prototype to simply assume all the Leios EBs and txs in their closures are well-formed.
170+
(TODO check for this one, since it's easy to check for---just in the patched node itself, or also in `generate`?)
171+
172+
## The mocked upstream peer
173+
174+
The mocked upstream peer is a patched variant of `immdb-server`.
175+
176+
- It runs incomplete variants of LeiosNotify and LeiosFetch: just EBs and EB closures, nothing else (no EB announcements, no votes, no range requests).
177+
- It serves the EBs present in the given `--leios-db`; it sends Leios notificaitons offering the data according to the given `--leios-schedule`.
178+
See the demo tool section above for how to generate those files.
179+
180+
## The patched node/node-under-test
181+
182+
The patched node is a patched variant of `cardano-node`.
183+
All of the material changes were made in the `ouroboros-consensus` repo; the `cardano-node` changes are merely for integration.
184+
185+
- It runs the same incomplete variants of LeiosNotify and LeiosFetch as the mocked upstream peer.
186+
- The Leios fetch request logic is a fully fledged first draft, with following primary shortcomings.
187+
- It only handles EBs and EB closures, not votes and not range requests.
188+
- It retains a number of heap objects in proportion with the number of txs in EBs it has acquired.
189+
The real node---and so subsequent iterations of this prototype---must instead keep that data on disk.
190+
This first draft was intended to do so, but we struggled to invent the fetch logic algorithm with the constraint that some of its state was on-disk; that's currently presumed to be possible, but has been deferred to a subsequent iteration of the prototype.
191+
- It never discards any information.
192+
The real node---and so subsequent iterations of this prototype---must instead discard EBs and EB closures once they're old enough, unless they are needed for the immutable chain.
193+
- Once it decides to fetch a set of txs from an upstream peer for the sake of some EB closure(s), it does not necessarily compose those into an optimal set of requests for that peer.
194+
We had not identified the potential for an optimizing algorithm here until writing this first prototype, so it just does something straight-forward and naive for now (which might be sufficient even for the real-node---we'll have to investigate later).
195+
196+
There are no other changes.
197+
In particular, that means the `ouroboros-network` mux doesn't not deprioritize Leios traffic.
198+
That change is an example of what this first prototype is intended to potentially demonstrate the need for.
199+
There are many such changes, from small to large.
200+
Some examples includes the following.
201+
202+
- The prototype uses SQLite3 with entirely default settings.
203+
Maybe Write-Ahead Log mode would be much preferable, likely need to VACUUM at some point, and so on.
204+
- The prototype uses a mutex to completely isolate every SQLite3 invocation---that's probably excessive, but was useful for some debugging during initial development (see the Engineering Notes appendix)
205+
- The prototype chooses several _magic numbers_ for resource utilization limits (eg max bytes per reqeust, max outsanding bytes per peer, fetch decision logic rate-limiting, txCache disk-bandwidth rate-limiting, etc).
206+
These all ultimately need to be tuned for the intended behvaiors on `mainnet`.
207+
- The prototype does not deduplicate the storage of EBs' closures when they share txs.
208+
This decision makes the LeiosFetch server a trivial single-pass instead of a join.
209+
However, it "wastes" disk space and disk bandwidth.
210+
It's left to future work to decide whether that's a worthwhile trade-off.
211+
212+
## The mocked downstream node
213+
214+
For simplicity, this is simply another instance of the patched node.
215+
In the future, it could be comparatively lightweight and moreover could replay an arbitrary schedule of downstream requests, dual to the mocked upstream peer's arbitrary schedule of upstream notifications.
216+
217+
# Appendix: Engineering Notes
218+
219+
This section summarizes some lessons learned during the development of this prototype.
220+
221+
- Hypothesis: A SQLite connection will continue to hold SQLite's internal EXCLUSIVE lock _even after the transaction is COMMITed_ when the write transaction involved a prepared statement that was accidentally not finalized.
222+
That hypothesis was inferred from a painstaking debugging session, but I haven't not yet confirmed it in isolation.
223+
The bugfix unsuprisingly amounted to using `bracket` for all prepare/finalize pairs and all BEGIN/COMMIT pairs; thankfully our DB patterns seem to accommodate such bracketing.
224+
- The SQLite query plan optimizer might need more information in order to be reliable.
225+
Therefore at least one join (the one that copies out of `txCache` for the EbTxs identified in an in-memory table) was replaced with application-level iteration.
226+
It's not yet clear whether a one-time ANALYZE call might suffice, for example.
227+
Even if it did, it's also not yet clear how much bandwidth usage/latency/jitter/etc might be reduced.

demo/2025-11/README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Prototype demo - November 2025
2+
3+
Slight improvement of the [October 2025 demonstration](../2025-10) using `tc` and better observability using Grafana et al. In summary the progress made during November 2025 on the Leios networking prototype is:
4+
5+
- The surprisingly-high latency observed in the October demo was explained and reined in.
6+
- Key structured log events were added to the prototype.
7+
- Observability/reporting/monitoring was improved.
8+
- Packaging of the prerequisites for executing the demo was improved.
9+
10+
![](./demo-2025-11.excalidraw.svg)
11+
12+
> [!TIP]
13+
> This is an excalidraw SVG with embedded scene so it can be loaded and edited in [https://excalidraw.com/].
14+
15+
## Bufferbloat
16+
17+
The investigation into the unexpectedly high latency seen in October and related refinements to the prototype are apparent in the asynchronous conversation that took place in the comments on this tracking Issue https://github.com/IntersectMBO/ouroboros-consensus/issues/1756.
18+
19+
- The latency was due to https://www.bufferbloat.net.
20+
In October, the bufferbloat arose directly from the naive use of [Toxiproxy](https://github.com/Shopify/toxiproxy) for the initial demo.
21+
- As user-space mechanism, Toxiproxy cannot introduce latency/rate/etc in a way that will influence the kernel algorithms managing the TCP stream.
22+
- [Linux Traffic Control](https://tldp.org/HOWTO/Traffic-Control-HOWTO/intro.html) is the approriate mechanism.
23+
- An example of relevant commands for a more appropriate WAN (Wide Area Network) emulation can be found in this GitHub comment https://github.com/IntersectMBO/ouroboros-consensus/issues/1756#issuecomment-3587268042.
24+
- `htb rate 100mbt` limts the sender's bandwidth.
25+
- `fq_codel` paces the sender's traffic, adapting to any bottleneck between it and the recipient.
26+
- `netem delay` established the link latency of 20ms between `fq_codel` and the recipient.
27+
- The Networking Team is now taking over this aspect of the setup for future Leios demos, refining//enriching the WAN emulation, preparing some testing on actual WANs (physically separated machines), and considering which mechniams ought to mitigate the risk of Leios-induced bufferbloat (perhaps as part of an ATK-LeiosProtocolProtocolBurst) increasing the latency of Praos traffic once Leios is deployed on mainnet.
28+
29+
## Improved Logging
30+
31+
Additional key events in both the mocked upstream peer and the node-under-test are now emitted as structured JSON, which the demo's analysis script processes.
32+
Highlights include the following.
33+
34+
- The node reliably indicates when concludes it acquired the last of the txs it was missing from an EB.
35+
In particular, this event is raised then a MsgLeiosBlockTx arrives with all the txs that the node was missing from some EB.
36+
Even if the final tx were to arrive as part of a separate EB, this event would still be emitted for each EB that the MsgLeiosBlockTx completes.
37+
- Both the node and upstream peer report when ChainSync's MsgRollForward, BlockFetch's MsgRequestRange and MsgBlock, and Leios's MsgLeiosBlockRequest, MsgLeiosBlock, MsgLeiosBlockTxsRequest, and MsgLeiosBlockTxs are sent and received.
38+
The demo's analysis script displays a table of when these messages were sent and received.
39+
This table very usefully indicates how much to alter the timings in the `demoSchedule.json` file in order to change which parts of the Praos traffic a particular EB's exchange overlaps with.
40+
- A patch to `ouroboros-network` was introduced to allow two additional timings, which will are expected to help make subsequent visualizations of the message exchange more accurate.
41+
- When a mini protocol begins trying to enqueue a message in the mux, even if the mux is unable to accept that message immediately.
42+
- When the demux receives the last byte of some message, even if the mini protocol doesn't attempt to decode that message immediately.
43+
- The `ss` tool is being used to sample socket statistics throughout the demo's execution, so that the TCP algorithm's state can be monitored.
44+
For example, the `rtt` and `notsent` fields are directly related to bufferbloat.
45+
46+
## Monitoring with Grafana
47+
48+
TODO
49+
50+
## Packaging
51+
52+
TODO
53+
54+
## Building from Source
55+
56+
Contributers who want to build the demo from source will need to packages in the three repositories on these commits.
57+
No other packages have yet been patched for this demo, the appropriate versions are those used in the 10.5.1 build.
58+
Beware that the listed commits do not already include `source-repository-package` stanzas in their `cabal.project` files, if that's
59+
the contributor's chosen method for cross-repo dependencies.
60+
61+
```
62+
$ for i in ouroboros-consensus ouroboros-network cardano-node; do (cd $i; echo REPO $i; git log -1); done
63+
REPO ouroboros-consensus
64+
commit 7929c3716a18abb852f8abec7111c78f2059287e (HEAD -> nfrisby/leios-202511-demo, origin/nfrisby/leios-202511-demo)
65+
Author: Nicolas Frisby <nick.frisby@iohk.io>
66+
Date: Thu Nov 27 12:57:43 2025 -0800
67+
68+
leiosdemo202511: polishing per-message table format
69+
REPO ouroboros-network
70+
commit 479f0d0d82413162c8444b912394dd74c052831f (HEAD -> nfrisby/leios-202511-demo, tag: leios-202511-demo, origin/nfrisby/leios-202511-demo)
71+
Author: Nicolas Frisby <nick.frisby@iohk.io>
72+
Date: Thu Nov 27 10:49:49 2025 -0800
73+
74+
leiosdemo202511: introduce BearerBytes class
75+
REPO cardano-node
76+
commit 93d2c8481912309faf5a7d9058f9fdeca95710a0 (HEAD -> nfrisby/leios-202511-demo, origin/nfrisby/leios-202511-demo)
77+
Author: Nicolas Frisby <nick.frisby@iohk.io>
78+
Date: Thu Nov 27 11:02:11 2025 -0800
79+
80+
leiosdemo202511: integrate ouroboros-network BearerBytes
81+
```

demo/2025-11/demo-2025-11.excalidraw.svg

Lines changed: 4 additions & 0 deletions
Loading

demo/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Demos
2+
3+
This is a collection of Leios demonstrations created using specially patched versions of `cardano-node` and other components not originating from this repository:
4+
5+
- [2025-10](2025-10/): Minimum viable demo of Leios network traffic interfering with Praos
6+
- [2025-11](2025-11/): Improvement of MVD using tc and better observability
7+
8+
There are other, component-specific demos you might be looking for:
9+
10+
- [Visualizer](../ui) contains a few stored scenarios that are standalone demos
11+
- [Cryptography](../crypto-benchmarks.rs/demo) demo of signing/verifying votes and certificates
12+
- [Trace translator](../scripts/trace-translator/demo) has an example/demo
File renamed without changes.

site/.envrc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
use nix
1+
use nix .#web

0 commit comments

Comments
 (0)