Skip to content

Conversation

@ghafek
Copy link

@ghafek ghafek commented Jun 29, 2025

No description provided.

@github-project-automation github-project-automation bot moved this to In Progress in SystemDS PR Queue Jun 29, 2025
@ghafek ghafek force-pushed the feature/ssb-benchmark branch from bcc4671 to e7ae6b3 Compare July 11, 2025 22:04
@ghafek ghafek changed the title Feature/ssb benchmark [SYSTEMDS-3862] SSB Benchmark Implementation Jul 14, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move this from shell/run_all_perf.sh to scripts/ssb/shell/run_all_perf.sh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move this from shell/run_ssb.sh to scripts/ssb/shell/run_ssb.sh

Copy link
Contributor

@gaturchenko gaturchenko Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move this and all the other .sql files from sql/ to scripts/ssb/sql/

run_all_perf.sh Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove this file from repository root

run_ssb.sh Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, remove this file from repository root

j143 and others added 19 commits November 27, 2025 11:47
This patch refines the current union operation to an internal LOP
operation. Currently, two subsequent operations -- rbind() and unique()
are used to perform the union operation. We rewrite the operation with
an internal LOP that uses a HashSet to compute the unique entries and
returns them in a matrix. This improves the efficiency of the
operation, as it avoids unique(). The order of the input entries is
preserved in the output.

Closes apache#2286.
This patch introduces a basic integration of the out-of-core backend.

For reading, we use a dedicated reblock instruction which creates
a queue of blocks, spawns a thread for reading and immediately returns.
In addition, we extended the acquireRead functionality to collect such
streams of blocks whenever an operations requires the full matrix.
Based on these foundations, we can now add other OCC operations that
directly work with the input stream of blocks and produce either results
or created modified output streams.
The test failure of not finding the mtd-file did not show up in local
tests, because locally we do not clean the test directories, and from
development there were both Xmtd and X.mtd existing.
This patch introduces the out-of-core unary aggregate operations as an
example of how to implement operations against the input stream of
blocks.
- Added SSB (Star Schema Benchmark) query implementations
- Created performance testing framework with run_all_perf.sh
- Added data caching and preprocessing capabilities
- Implemented comprehensive logging and output management
- Added documentation and status tracking files

This is work in progress for the SSB benchmark feature.
@ghafek ghafek force-pushed the feature/ssb-benchmark branch from 2ebb5d1 to fd7edeb Compare November 27, 2025 12:54
@ghafek
Copy link
Author

ghafek commented Nov 27, 2025

Hi @gaturchenko ,

I hope you're doing well.

I wanted to let you know that all merge conflicts have now been resolved and the feature branch is clean again. The requested changes from your previous review have been implemented as well.

Before proceeding further, I would like to ask for your preference:

Would you prefer

1.to continue the review on this existing PR,
or

2.that I create a new, clean PR based on an up-to-date main branch (containing only the SSB-related commits and no historical noise)?

I'm completely fine with both options — just let me know what works best for you.

Thank you again for your time, review, and guidance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

6 participants