sklearn's ColumnTransformer has good functionality for mixed-data pre-processing, and would tidy up some of our code. Currently sklearn lacks inverse transform for this specific Transformer, and although requested here scikit-learn/scikit-learn#11463 and fix proposed here scikit-learn/scikit-learn#11639 does not seem to be implemented yet.
The basic workflow would be to:
- Detect datatypes
- Build column transformer
CT with numeric and categorical encoders
- Run SyGNet
- Inverse transform generated data using
CT.inverse_transform()
Assuming this method is implemented at some point, we should revise our function.