Skip to content

Conversation

@R-Palazzo
Copy link
Contributor

@R-Palazzo R-Palazzo commented Nov 7, 2025

Resolve #481
Resolve #491
CU-86b7b0kaa
I share this notebook to try out the feature.

@R-Palazzo R-Palazzo requested a review from amontanez24 November 7, 2025 19:21
@R-Palazzo R-Palazzo self-assigned this Nov 7, 2025
@sdv-team
Copy link
Contributor

sdv-team commented Nov 7, 2025

@codecov
Copy link

codecov bot commented Nov 7, 2025

Codecov Report

❌ Patch coverage is 93.84615% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.04%. Comparing base (6a3b217) to head (ec4d347).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sdgym/benchmark.py 76.92% 3 Missing ⚠️
sdgym/utils.py 70.00% 3 Missing ⚠️
sdgym/synthesizers/generate.py 95.00% 1 Missing ⚠️
sdgym/synthesizers/sdv.py 98.18% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #489      +/-   ##
==========================================
+ Coverage   74.86%   76.04%   +1.17%     
==========================================
  Files          29       30       +1     
  Lines        2399     2371      -28     
==========================================
+ Hits         1796     1803       +7     
+ Misses        603      568      -35     
Flag Coverage Δ
integration 61.60% <80.76%> (+1.23%) ⬆️
unit 71.02% <89.23%> (+1.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pvk-developer pvk-developer self-requested a review November 7, 2025 22:36
@R-Palazzo
Copy link
Contributor Author

R-Palazzo commented Nov 10, 2025

@amontanez24 @pvk-developer
Should we support using DayZSynthesizer in a benchmark?
Right now it won’t be picked up, because our discovery only looks for subclasses of BaseSingleTableSynthesizer and BaseMultiTableSynthesizer, and DayZ doesn’t inherit from those. That’s a small change to make.

Two things to consider:

  • DayZSynthesizer does not have a fit() method; its SDGym wrapper should take this into account to make it work in a benchmark.
  • Both the single-table and multi-table versions are named DayZSynthesizer, but based on the context we should still be able to determine which one we need I think

@pvk-developer
Copy link
Member

@amontanez24 @pvk-developer Should we support using DayZSynthesizer in a benchmark? Right now it won’t be picked up, because our discovery only looks for subclasses of BaseSingleTableSynthesizer and BaseMultiTableSynthesizer, and DayZ doesn’t inherit from those. That’s a small change to make.

Two things to consider:

  • DayZSynthesizer does not have a fit() method; its SDGym wrapper should take this into account to make it work in a benchmark.
  • Both the single-table and multi-table versions are named DayZSynthesizer, but based on the context we should still be able to determine which one we need I think

If I had to write this as per the issue is saying, we should get all the synthesizers that are accessed by sdv.single_table such as:

In [3]: from sdv import single_table

In [4]: dict([(name, cls) for name, cls in single_table.__dict__.items() if isinstance(cls, type)])
Out[4]: 
{'CopulaGANSynthesizer': sdv.single_table.copulagan.CopulaGANSynthesizer,
 'GaussianCopulaSynthesizer': sdv.single_table.copulas.GaussianCopulaSynthesizer,
 'CTGANSynthesizer': sdv.single_table.ctgan.CTGANSynthesizer,
 'TVAESynthesizer': sdv.single_table.ctgan.TVAESynthesizer,
 'DayZSynthesizer': sdv_enterprise.sdv.single_table.dayz.day_zero.DayZSynthesizer,
 'BootstrapSynthesizer': sdv_enterprise.sdv.single_table.bootstrap.bootstrap.BootstrapSynthesizer,
 'DPGCFlexSynthesizer': sdv_enterprise.sdv.single_table.differential_privacy.dp_gc_flex_synthesizer.DPGCFlexSynthesizer,
 'DPGCSynthesizer': sdv_enterprise.sdv.single_table.differential_privacy.dp_gc_synthesizer.DPGCSynthesizer,
 'SegmentSynthesizer': sdv_enterprise.sdv.single_table.segment.segment.SegmentSynthesizer,
 'XGCSynthesizer': sdv_enterprise.sdv.single_table.xgc.xgc.XGCSynthesizer}

Then my other comment on how you got the subclasses can go away and you could just use this functionality instead.

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something but my main thought is that we don't need to make the classes importable. I think we can just create a class object from a string matching a valid SDV synthesizer when it is requested by the benchmark.

@R-Palazzo R-Palazzo force-pushed the issue-481-enterprise-synthesizer branch from 9d27f3d to cf9ea70 Compare November 11, 2025 15:57
@R-Palazzo R-Palazzo marked this pull request as ready for review November 11, 2025 16:50
@R-Palazzo R-Palazzo requested a review from a team as a code owner November 11, 2025 16:50
@R-Palazzo R-Palazzo requested review from frances-h and removed request for a team and frances-h November 11, 2025 16:50
Copy link
Member

@pvk-developer pvk-developer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

sdgym/utils.py Outdated
synthesizer_name = synthesizer
if synthesizer in st_sdv_synthesizers + mt_sdv_synthesizers:
modality = 'single_table' if synthesizer in st_sdv_synthesizers else 'multi_table'
instance = BaselineSDVSynthesizer(synthesizer, modality)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of directly returning an instance of BaselineSDVSynthesizer, can you add a function, create_sdv_synthesizer_class and in it return something like:

    CustomSynthesizer = type(
        class_name, # Should be the synthesizer name in sdv
        (BaselineSynthesizer,),
        {
            '__module__': __name__,
            'get_trained_synthesizer': get_trained_synthesizer,
            'sample_from_synthesizer': sample_from_synthesizer,
        },
    )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes done in 33e5813

@R-Palazzo
Copy link
Contributor Author

R-Palazzo commented Nov 13, 2025

@amontanez24 did some additional tests locally and also checked that it works on ec2 instances.
There is also this notebook to try out with sdv-enterprise.

@R-Palazzo R-Palazzo merged commit ddd3dcf into main Nov 14, 2025
48 of 68 checks passed
@R-Palazzo R-Palazzo deleted the issue-481-enterprise-synthesizer branch November 14, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename create_sdv_synthesizer_variant to create_synthesizer_variant SDGym should be able to automatically discover SDV Enterprise synthesizers

5 participants