Skip to content

CART Regression throws ArrayIndexOutOfBoundsException when using TrainTestSplit with proportion 1.0 #374

@Artraxon

Description

@Artraxon

Describe the bug
When using the TrainTestSplitter with split proportion 1.0 and seed 1L with a SQLDataSource that retrieves 1107 tuples with below trainer configuration the following exception is thrown.

java.lang.ArrayIndexOutOfBoundsException: arraycopy: last destination index 1116 out of bounds for int[1107]
	at java.base/java.lang.System.arraycopy(Native Method)
	at org.tribuo.regression.rtree.impl.InvertedFeature.split(InvertedFeature.java:173)
	at org.tribuo.regression.rtree.impl.TreeFeature.split(TreeFeature.java:155)
	at org.tribuo.regression.rtree.impl.RegressorTrainingNode.splitAtBest(RegressorTrainingNode.java:322)
	at org.tribuo.regression.rtree.impl.RegressorTrainingNode.buildGreedyTree(RegressorTrainingNode.java:204)
	at org.tribuo.regression.rtree.impl.RegressorTrainingNode.buildTree(RegressorTrainingNode.java:152)
	at org.tribuo.regression.rtree.CARTRegressionTrainer.train(CARTRegressionTrainer.java:210)
	at org.tribuo.regression.rtree.CARTRegressionTrainer.train(CARTRegressionTrainer.java:60)
	at org.tribuo.ensemble.BaggingTrainer.trainSingleModel(BaggingTrainer.java:186)
	at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:168)
	at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:145)
	at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:140)
	at org.tribuo.ensemble.BaggingTrainer.train(BaggingTrainer.java:54)

To Reproduce
I use the following configuration of the trainer:

CARTRegressionTrainer cartTrainer = new CARTRegressionTrainer(10,
                                                              AbstractCARTTrainer.MIN_EXAMPLES,
                                                              0.0F,
                                                              0.5F,
                                                              false,
                                                              new MeanAbsoluteError(),
                                                              Trainer.DEFAULT_SEED);
Trainer<Regressor> rfTrainer = new RandomForestTrainer<>(cartTrainer,
                                                         new AveragingCombiner(),
                                                         100,
                                                         5);

The error does not occur when using XGBoost, or when using the SQLDataSource directly without passing it through the splitter, even though the amount of tuples is the same.

Expected behaviour

I expect that using the TrainTestSplitter with a proportion of 1.0 behaves the same way as not using it at all (or at least not producing an error)

System information:

  • Tribuo Version: 4.3.1
  • OS: Arch Linux with linux 6.10.2, but runs in Ubuntu 22:04 container
  • Java Version: 21 ( openjdk-21-jdk 21.0.3+9-1ubuntu1~22.04.1)
  • JDK Vendor: openjdk

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions