Describe the bug
When using the TrainTestSplitter with split proportion 1.0 and seed 1L with a SQLDataSource that retrieves 1107 tuples with below trainer configuration the following exception is thrown.
java.lang.ArrayIndexOutOfBoundsException: arraycopy: last destination index 1116 out of bounds for int[1107]
at java.base/java.lang.System.arraycopy(Native Method)
at org.tribuo.regression.rtree.impl.InvertedFeature.split(InvertedFeature.java:173)
at org.tribuo.regression.rtree.impl.TreeFeature.split(TreeFeature.java:155)
at org.tribuo.regression.rtree.impl.RegressorTrainingNode.splitAtBest(RegressorTrainingNode.java:322)
at org.tribuo.regression.rtree.impl.RegressorTrainingNode.buildGreedyTree(RegressorTrainingNode.java:204)
at org.tribuo.regression.rtree.impl.RegressorTrainingNode.buildTree(RegressorTrainingNode.java:152)
at org.tribuo.regression.rtree.CARTRegressionTrainer.train(CARTRegressionTrainer.java:210)
at org.tribuo.regression.rtree.CARTRegressionTrainer.train(CARTRegressionTrainer.java:60)
at org.tribuo.ensemble.BaggingTrainer.trainSingleModel(BaggingTrainer.java:186)
at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:168)
at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:145)
at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:140)
at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:54)
To Reproduce
I use the following configuration of the trainer:
CARTRegressionTrainer cartTrainer = new CARTRegressionTrainer(10,
AbstractCARTTrainer.MIN_EXAMPLES,
0.0F,
0.5F,
false,
new MeanAbsoluteError(),
Trainer.DEFAULT_SEED);
Trainer<Regressor> rfTrainer = new RandomForestTrainer<>(cartTrainer,
new AveragingCombiner(),
100,
5);
The error does not occur when using XGBoost, or when using the SQLDataSource directly without passing it through the splitter, even though the amount of tuples is the same.
Expected behaviour
I expect that using the TrainTestSplitter with a proportion of 1.0 behaves the same way as not using it at all (or at least not producing an error)
System information:
- Tribuo Version: 4.3.1
- OS: Arch Linux with linux 6.10.2, but runs in Ubuntu 22:04 container
- Java Version: 21 ( openjdk-21-jdk 21.0.3+9-1ubuntu1~22.04.1)
- JDK Vendor: openjdk
Describe the bug
When using the TrainTestSplitter with split proportion 1.0 and seed 1L with a SQLDataSource that retrieves 1107 tuples with below trainer configuration the following exception is thrown.
To Reproduce
I use the following configuration of the trainer:
The error does not occur when using XGBoost, or when using the SQLDataSource directly without passing it through the splitter, even though the amount of tuples is the same.
Expected behaviour
I expect that using the TrainTestSplitter with a proportion of 1.0 behaves the same way as not using it at all (or at least not producing an error)
System information: