SGD batching improvements, LBFGS and a Gaussian Process#435
Open
Craigacp wants to merge 29 commits intooracle:mainfrom
Open
SGD batching improvements, LBFGS and a Gaussian Process#435Craigacp wants to merge 29 commits intooracle:mainfrom
Craigacp wants to merge 29 commits intooracle:mainfrom
Conversation
…ugh into the trainers and parameters. This commit calls a lot of methods which need to be created in the la package.
…option to DataOptions.
…gradient methods on SGDObjective return a record not a pair, and adding a few methods to DenseMatrix and DenseVector in support of L-BFGS.
…tion cache world.
…pped if the gradient is zero.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR started with a series of changes to improve training speed on batches in SGD training. This added several improvements to the linear algebra package and refactored the SGD objective functions to operate on single examples or batches. These changes are prerequisites for the Gaussian process and LinearTrainer/LBGFS implementations which directly operate on the full dataset as a single batch. The GaussianProcessTrainer is a exact implementation which can be used for small regression problems using a supplied kernel function to compute training data point similarity. The LinearTrainer is an implementation of linear/logistic regression which uses a second order gradient descent method (LBFGS) to find the global minima of the objective (unlike SGD which may not find the global minima without fine tuning of the learning rate), and it also has built in L2 regularization to find a small parameter vector.
Future work is to add support for optimizing the kernel hyperparameters in the Gaussian process, and adding link functions to the GP so they can be used for classification as well, but this PR is plenty big enough as it is.
Motivation
Improves the speed of minibatch SGD training where the batch size is greater than 1 (on dense data), adds an alternative linear model which finds the optima via second order methods, and adds a flexible kernel based regression suitable for small datasets.
Paper reference