Skip to content

Conversation

@yfnaji
Copy link
Contributor

@yfnaji yfnaji commented Oct 20, 2025

Implementation of Lasso Regression

This pull request introduces an implementation of the Lasso regression model as part of the RustQuant_ml crate.

Lasso regression extends linear regression by adding an L1 regularisation term, which encourages sparsity in the coefficient vector, resulting in some coefficients being driven to zero. This in turn enables feature selection and reduces overfitting.

This implementation is designed to closely align with Scikit-Learn’s linear_model.Lasso model. The Lasso implementation from Scikit-Learn using the same data as the unit tests in this PR is available here.

Coordinate Descent Algorithm

The Loss function for Lasso regression is given by:

$$ L := \frac{1}{2n}\left\lVert\textbf{y} - X\beta\right\rVert_2^2 + \lambda\left|\beta\right| $$

We iterate through each column individually, isolating the algorithm to column $j$.

Define the partial residual for column $j$ as:

$$ r_j = \textbf{y} - X\beta + X_j\beta_j $$

where $X_j$ is the $j^{\text{th}}$ column of $X$, and $\beta_j$ is the coefficient corresponding to the $j^{\text{th}}$ feature.

This effectively computes the residuals without the $j^{\text{th}}$ column, allowing us to assess the contribution of feature $j$ to the model.

The objective function becomes:

$$ \frac{1}{2n}\left\lVert r_j - X_j\beta_j\right\rVert_2^2 + \lambda\left|\beta_j\right| $$

which can be expanded as:

$$ \frac{1}{2n}\left(r_j - X_j\beta_j\right)^T\left(r_j - X_j\beta_j\right) + \lambda\left|\beta_j\right| $$

$$ \Rightarrow \frac{1}{2n}\left(r_j^Tr_j - 2\beta_jX_j^Tr_j + \beta_j^2X_j^TX_j\right) + \lambda\left|\beta_j\right| $$

Noting that $\beta_j \in \mathbb{R}$ (and thus $\beta_j^T = \beta_j$), we define:

$$ \rho_j := \frac{1}{n}X_j^Tr_j \quad \text{and} \quad z_j := \frac{1}{n}X_j^TX_j $$

We can now minimise with respect to $\beta_j$:

$$ \min_{\beta_j} \left[\frac{1}{2}z_j\beta_j^2 - \rho_j\beta_j\right] + \lambda\left|\beta_j\right| $$

Differentiating with respect to $\beta_j$ and substituting the minimum value, we obtain:

$$ \left.\frac{\partial L}{\partial \beta_j}\right\vert_{\beta_j=\hat{\beta}_j} = z_j\hat{\beta}_j - \rho_j + \lambda , \text{sgn}\left(\hat{\beta}_j\right) = 0 $$

where

$$ \text{sgn}(x) = \begin{cases} 1, & x > 0\\ 0, & x = 0\\ -1, & x < 0 \end{cases} $$

We now consider three cases:

Case 1: $\hat{\beta}_j &gt; 0$

$$ z_j\hat{\beta}_j - \rho_j + \lambda = 0 $$

$$ \Rightarrow \hat{\beta}_j = \frac{\rho_j - \lambda}{z_j}, \quad \text{valid when } \rho_j > \lambda $$

Case 2: $\hat{\beta}_j &lt; 0$

$$ z_j\hat{\beta}_j - \rho_j - \lambda = 0 $$

$$ \Rightarrow \hat{\beta}_j = \frac{\rho_j + \lambda}{z_j}, \quad \text{valid when } \rho_j < -\lambda $$

Case 3: $\hat{\beta}_j = 0$

Since $\text{sgn}(x)$ is not differentiable at $x = 0$, we use its subgradient:

$$ \text{sgn}(0) \in [-1, 1] $$

Thus, for $\hat{\beta}_j = 0$ to hold, we require:

$$ 0 = -\rho_j + \lambda x \quad \text{where} \quad x \in [-1, 1] $$

$$ \Rightarrow \rho_j \in [-\lambda, \lambda] $$

Since we iterate through each column while holding other coefficients fixed, the update for each coefficient must be repeated until convergence or until a maximum number of iterations is reached.

Define the change in the coefficient for feature $h$ at iteration $k$:

$$ \delta^{(k+1)} := \hat{\beta}^{(k+1)}_h - \hat{\beta}^{(k)}_h $$

For the $i^{\text{th}}$ observation, define the residual:

$$ \varepsilon_i := y_i - \sum_{j=1}^{n} X_{ij}\hat{\beta}_j $$

The residuals are updated after each coefficient adjustment as:

$$ \varepsilon^{(k+1)}_i = \varepsilon^{(k)}_i - \delta^{(k+1)} \cdot X_{ij} $$

This ensures that at each iteration, the residuals reflect the most recent coefficient updates:

$$ y_i - \sum_{j \neq h} X_{ij}\hat{\beta}_j - \beta^{(k+1)} X_{ih} $$

as required.


let delta: f64 = new_coefficient_j - coefficients[j];
if delta.abs() > 0.0 {
residuals -= &feature_vals_col_j * delta;

Check warning

Code scanning / clippy

needlessly taken reference of left operand Warning

needlessly taken reference of left operand

let delta: f64 = new_coefficient_j - coefficients[j];
if delta.abs() > 0.0 {
residuals -= &feature_vals_col_j * delta;

Check warning

Code scanning / clippy

needlessly taken reference of left operand Warning

needlessly taken reference of left operand
0.743_965_706_491_596_7,
-0.304_713_846_510_641_43,
1.355_162_653_724_116_22,
][i],

Check warning

Code scanning / clippy

float has excessive precision Warning

float has excessive precision
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant