Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implementation of Lasso Regression
This pull request introduces an implementation of the Lasso regression model as part of the
RustQuant_mlcrate.Lasso regression extends linear regression by adding an L1 regularisation term, which encourages sparsity in the coefficient vector, resulting in some coefficients being driven to zero. This in turn enables feature selection and reduces overfitting.
This implementation is designed to closely align with Scikit-Learn’s
linear_model.Lassomodel. The Lasso implementation from Scikit-Learn using the same data as the unit tests in this PR is available here.Coordinate Descent Algorithm
The Loss function for Lasso regression is given by:
We iterate through each column individually, isolating the algorithm to column$j$ .
Define the partial residual for column$j$ as:
where$X_j$ is the $j^{\text{th}}$ column of $X$ , and $\beta_j$ is the coefficient corresponding to the $j^{\text{th}}$ feature.
This effectively computes the residuals without the$j^{\text{th}}$ column, allowing us to assess the contribution of feature $j$ to the model.
The objective function becomes:
which can be expanded as:
Noting that$\beta_j \in \mathbb{R}$ (and thus $\beta_j^T = \beta_j$ ), we define:
We can now minimise with respect to$\beta_j$ :
Differentiating with respect to$\beta_j$ and substituting the minimum value, we obtain:
where
We now consider three cases:
Case 1:$\hat{\beta}_j > 0$
Case 2:$\hat{\beta}_j < 0$
Case 3:$\hat{\beta}_j = 0$
Since$\text{sgn}(x)$ is not differentiable at $x = 0$ , we use its subgradient:
Thus, for$\hat{\beta}_j = 0$ to hold, we require:
Since we iterate through each column while holding other coefficients fixed, the update for each coefficient must be repeated until convergence or until a maximum number of iterations is reached.
Define the change in the coefficient for feature$h$ at iteration $k$ :
For the$i^{\text{th}}$ observation, define the residual:
The residuals are updated after each coefficient adjustment as:
This ensures that at each iteration, the residuals reflect the most recent coefficient updates:
as required.