Benchmarks

Objectives

The benchmark folder is by the skrub maintainers to:

Experiment on new algorithms
Validate decisions based on empirical evidence
Tune (hyper)parameters in the library

These benchmarks do not aim at replacing the tests within skrub.

Implementing a benchmark

A mini-framework consisting of a few functions is made available under utils.

Check out other benchmarks to see how they are used.

Launching a benchmark

Note

Launching a benchmark is usually something you don't want to do as a user. Benchmarks are long and expensive to run. Their code is provided for reproducibility.

Each one implements a standard command-line interface with the at least the two commands --run and --plot.

Although, before launching, you should make sure the environment is properly setup. First, install the required packages -- we recommend installing the latest versions for everything (skip --upgrade if you don't want to):

pip install -e --upgrade .[benchmarks]

It has also been reported that Python >=3.9 is required.

Then, if you're trying to reproduce the results of a benchmark, check the file's docstring to see if it requires any additional setup. Usually, you will find a date, which might be relevant, and sometimes, a commit hash. You can use it to checkout the code at the time the benchmark was run:

git checkout <commit_hash>

Finally, you can launch the benchmark with the --run command:

python bench_tablevectorizer_tuning.py --run

Analyzing results

The results of the benchmarks ran by maintainers are pushed in the results/ folder in a parquet format.

As mentioned earlier, benchmarks implement a --plot option used to display the results visually. Using --plot without --run allows you to plot the saved results without re-running the benchmark.

Format

Results are saved with the format <name>-<YYYYMMDD>.parquet in the subfolder results.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmarks		benchmarks
LICENSE		LICENSE
README.rst		README.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmarks

Objectives

Implementing a benchmark

Launching a benchmark

Analyzing results

Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

skrub-data/skrub-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Benchmarks

Objectives

Implementing a benchmark

Launching a benchmark

Analyzing results

Format

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages