SGD batching improvements, LBFGS and a Gaussian Process by Craigacp · Pull Request #435 · oracle/tribuo

Craigacp · 2026-03-19T01:28:14Z

Description

This PR started with a series of changes to improve training speed on batches in SGD training. This added several improvements to the linear algebra package and refactored the SGD objective functions to operate on single examples or batches. These changes are prerequisites for the Gaussian process and LinearTrainer/LBGFS implementations which directly operate on the full dataset as a single batch. The GaussianProcessTrainer is a exact implementation which can be used for small regression problems using a supplied kernel function to compute training data point similarity. The LinearTrainer is an implementation of linear/logistic regression which uses a second order gradient descent method (LBFGS) to find the global minima of the objective (unlike SGD which may not find the global minima without fine tuning of the learning rate), and it also has built in L2 regularization to find a small parameter vector.

Future work is to add support for optimizing the kernel hyperparameters in the Gaussian process, and adding link functions to the GP so they can be used for classification as well, but this PR is plenty big enough as it is.

Motivation

Improves the speed of minibatch SGD training where the batch size is greater than 1 (on dense data), adds an alternative linear model which finds the optima via second order methods, and adds a flexible kernel based regression suitable for small datasets.

Paper reference

LBGFS following Nocedal & Wright 2006, "Numerical Optimization (2nd Edition)
Gaussian Process regression following Rasmussen & Williams 2006, "Gaussian Processes for Machine Learning".

…ugh into the trainers and parameters. This commit calls a lot of methods which need to be created in the la package.

…option to DataOptions.

…gradient methods on SGDObjective return a record not a pair, and adding a few methods to DenseMatrix and DenseVector in support of L-BFGS.

…the loss.

…using it.

…FGS.

…tion cache world.

…pped if the gradient is zero.

Craigacp added 28 commits March 18, 2026 20:27

Removing deprecated loss method.

dce9e04

Expanding SGD losses so they natively work on batches. Following thro…

800d3e2

…ugh into the trainers and parameters. This commit calls a lot of methods which need to be created in the la package.

Implementing more la functions.

3d47d92

Working on matrix batching.

de8a985

Replacing ArrayMatrix with Matrix.aggregate.

71d0088

Adding createTargetArray and fixing bugs in matrix multiplication order.

0010c24

Small fixes for AbstractSGDTrainer and DenseMatrix, adding a densify …

e124786

…option to DataOptions.

Adding loss & batchLoss methods to SGDObjective, making the loss and …

52cad1d

…gradient methods on SGDObjective return a record not a pair, and adding a few methods to DenseMatrix and DenseVector in support of L-BFGS.

Fixing a bug in DenseMatrix.subtract.

998c033

Flipping the classification losses around so we're always minimising …

96c7d33

…the loss.

Adding L-BFGS, a linear regression using it, and a linear classifier …

114c47d

…using it.

Refactoring LinearTrainer to share code.

49fb65c

More logging, less print statements.

3ab71d8

Changing logging level in line search.

0df75fb

More small logging changes to line search.

e78742c

Adding in convergence limit check.

61c379f

Adding smoke tests and a few small tidy ups.

32818d1

Adding multi-label linear trainer.

efe05b6

LinearTrainer uses example weights.

026de8f

Initial implementation of gaussian process regression.

4002df2

Batching predictions.

380d834

Adding l2 regularisation to LinearTrainer.

4c98d10

Updating gaussian process for protobuf serialization.

f90a304

Finishing GP serialization.

d098568

Fix linear trainer so it descends the gradient properly.

c1ac1f1

Adding a Wolfe line search and fixing a bunch of direction bugs in LB…

0925d36

…FGS.

Bunch of small cleanups and moving the GP over to the new deserialisa…

aba863d

…tion cache world.

Rearranging the convergence checks in LBFGS so the line search is ski…

69b6475

…pped if the gradient is zero.

oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Mar 19, 2026

Turning off the wine-quality test.

06ad26a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SGD batching improvements, LBFGS and a Gaussian Process#435

SGD batching improvements, LBFGS and a Gaussian Process#435
Craigacp wants to merge 29 commits intooracle:mainfrom
Craigacp:sgd-batching

Craigacp commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Craigacp commented Mar 19, 2026

Description

Motivation

Paper reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant