Skip to content

Update DataFusion instructions / Enable swap on small machines#804

Merged
rschu1ze merged 2 commits intoClickHouse:mainfrom
alamb:alamb/fix_low_mem_machines
Mar 3, 2026
Merged

Update DataFusion instructions / Enable swap on small machines#804
rschu1ze merged 2 commits intoClickHouse:mainfrom
alamb:alamb/fix_low_mem_machines

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 27, 2026

Rationale

I would like to

  1. Get the ClickBench benchmark results reflecting the most recent version of DataFusion
  2. More easily run pre-release DataFusion benchmarks in the ClickBench harness so we can evaluate their impact (e.g. Analyze current ClickBench performance with DataFusion 52 apache/datafusion#20601)

Since we last successfully ran the benchmarks ourselves (47.0.0), new smaller machines (c6a.xlarge 4core, 8G RAM) where the existig scripts have struggled a lot. Specifically:

  1. rustc is OOM killed (due to the link flags we pass)
  2. datafusion-cli is OOM killed (due to issues such as datafusion-cli fails to run ClickBench queries with 8GB of RAM apache/datafusion#18473)

Also, the existing instructions are somewhat outdated

Changes

  1. Update README.md to reflect reality, and incorporate changes from @waynexia in Update Results for DataFusion 52.0.0 #749 (reverted by Revert "Update Results for DataFusion 52.0.0" #766)
  2. Automatically enable swap on low memory machines (as requested by @rschu1ze in Update Results for DataFusion 52.0.0 #749 (comment))
  3. Add in the make-json.sh script to help us evaluate the current status locally

Non Changes

Note this PR does NOT update any results (I will make a follow on PR with actual numbers, I want to get the scripts into shape first)

Testing

I tested on these machines following the instructions

  • c6.xlarge
  • c8g.4xlarge

@alamb alamb force-pushed the alamb/fix_low_mem_machines branch from e6e271a to 9d5f358 Compare February 27, 2026 20:07
@alamb
Copy link
Contributor Author

alamb commented Feb 27, 2026

FYI @pmcgleenon and @waynexia I wonder what you think of this approach (just turning on swap)?

@alamb
Copy link
Contributor Author

alamb commented Mar 2, 2026

FYI @rschu1ze -- I believe this PR is now ready for review

source ~/.cargo/env

if [ $(free -g | awk '/^Mem:/{print $2}') -lt 12 ]; then
echo "LOW MEMORY MODE"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the corresponding change in datafusion are the only functional changes). This implements the request by @rschu1ze in #749 (comment))

@@ -0,0 +1,37 @@
#!/bin/bash

# This script converts the raw `result.csv` data from `benchmark.sh` into the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the helper script from #749 (with a different name) for our convenience when producing new run results

## Cookbook: Generate benchmark results

## Generate benchmark results
Follow instructions in the [datafusion](../datafusion/README.md) directory.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced the duplication of the documentation and updated the Known Issues

@rschu1ze rschu1ze self-assigned this Mar 3, 2026
@rschu1ze rschu1ze merged commit a44f1c2 into ClickHouse:main Mar 3, 2026
@rschu1ze
Copy link
Member

rschu1ze commented Mar 3, 2026

@alamb I verified the changes locally - they work well, thanks. Let me know if you like me to update the results as well (this makes sense only if swapping is expected to change the measurements).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants