Lasso Regression in R [Update 2024]


Key Points

  • Lasso regression is a type of linear regression that adds a penalty term to the loss function that is proportional to the sum of the absolute values of the coefficients. This penalty term is also known as the L1 norm of the coefficients.
  • Lasso regression can perform variable selection by shrinking some of the coefficients to exactly zero, thus removing some predictors from the model. This can help reduce overfitting and improve interpretability.
  • Lasso regression can handle multicollinearity by assigning similar coefficients to correlated predictors, thus reducing their individual influence on the model.
  • To perform lasso regression in R, we can use the glmnet package, which provides functions for fitting generalized linear models with L1 and L2 regularization. The main function is glmnet, which takes a matrix of predictor values (x) and a vector of target values (y) as arguments, and returns an object of class “glmnet”, which contains information about the fitted model. We can set alpha to 1 in the glmnet function to perform lasso regression.
  • To select the optimal value of the tuning parameter (lambda) that minimizes the prediction error, we can use cross-validation, which is a technique that splits the data into several subsets (folds), trains the model on some of the subsets (training set), and evaluates the model on the remaining subsets (validation set). The glmnet package provides a function called cv.glmnet, which performs cross-validation for glmnet models. The cv.glmnet function returns an object of class “cv.glmnet”, which contains information about the cross-validation results, such as the optimal lambda value and the corresponding coefficients.
  • To compare lasso regression with ridge regression and elastic net, we can use different alpha values in the glmnet and cv.glmnet functions. Alpha can take values between 0 and 1, where 0 corresponds to ridge regression (L2 penalty), 1 corresponds to lasso regression (L1 penalty), and any value in between corresponds to elastic net (a combination of L1 and L2 penalties). We can use print, summary, or plot functions to inspect and visualize the results for each model.

Tables

Function Description Package
glmnet Fit a generalized linear model with L1 or L2 regularization glmnet
cv.glmnet Perform cross-validation for glmnet models glmnet
coef Extract coefficients from a glmnet or cv.glmnet object glmnet
predict Make predictions from a glmnet or cv.glmnet object glmnet
plot Plot a glmnet or cv.glmnet object glmnet
model.matrix Create a matrix of predictor values from a formula and a data frame stats
mean Compute the mean of a vector or a matrix base
var Compute the variance of a vector or a matrix base
set.seed Set or query the random number seed base
legend Add legends to plots graphics
Lasso Regression in R [Update 2023]

Lasso regression is a popular machine learning technique that can be used to perform variable selection and regularization in linear models. In this blog post, you will learn how to implement lasso regression using the glmnet package. 

You will also learn how to compare lasso with ridge regression and elastic net, and how to select the optimal tuning parameter using cross-validation. This article is worth reading if you want to improve your data science skills and learn how to fit a lasso regression model in R.

What is Lasso Regression?

It is a type of linear regression that adds a penalty term to the loss function, which is proportional to the sum of the absolute values of the coefficients. This penalty term is also known as the L1 norm of the coefficients. The model can be written as:

Formula of Lasso Regression in R

The lasso regression model has two main advantages over the traditional linear regression model:

  • It can perform variable selection by shrinking some of the coefficients to exactly zero, thus removing some predictors from the model. This can help reduce overfitting and improve interpretability.
  • It can handle multicollinearity by assigning similar coefficients to correlated predictors, thus reducing their individual influence on the model.

How to Perform Lasso Regression in R?

We will use the glmnet package, which provides functions for fitting generalized linear models with L1 and L2 regularization. The glmnet package can handle various types of outcomes, such as continuous, binary, multinomial, and count data. In this tutorial, we will focus on fitting a lasso regression model for continuous outcomes.

To illustrate how to use the glmnet package, we will use the mtcars dataset, which contains information about 32 cars, such as miles per gallon (mpg), number of cylinders (cyl), displacement (disp), horsepower (hp), weight (wt), and so on. We will use mpg as our target variable and all other variables as our predictors.

First, we need to load the glmnet package and the mtcars dataset:

library(glmnet)
data(mtcars)

Next, we must prepare our data for fitting a lasso regression model. We must create a matrix of predictor values (X) and a vector of target values (y). We also need to standardize our predictor variables to have mean zero and unit variance. This is important because lasso regression penalizes the absolute values of the coefficients, which depend on the scale of the variables. The glmnet package provides a convenient function called model.matrix that can create a matrix of predictor values from a formula and a data frame. It also automatically adds an intercept term to the matrix. We can use this function as follows:

X <- model.matrix(mpg ~ ., data = mtcars)
y <- mtcars$mpg

Now we are ready to fit a lasso regression model using the glmnet function. The glmnet function takes two main arguments: x and y, the matrix of predictor values, and the vector of target values. 

It also takes several optional arguments, such as alpha, which specifies the type of regularization to use. Alpha can take values between 0 and 1, where 0 corresponds to ridge regression (L2 penalty), 1 corresponds to lasso regression (L1 penalty), and any value in between corresponds to elastic net (a combination of L1 and L2 penalties). 

This tutorial will set alpha to 1 to perform lasso regression. Another important argument is lambda, which specifies the value of the tuning parameter that controls the amount of regularization. 

The glmnet function can automatically select a sequence of lambda values based on the data, or we can manually specify our own lambda values. In this tutorial, we will let glmnet choose our lambda values.

We can fit a lasso regression model using the following code:

set.seed(123) # set seed for reproducibility
lasso_model <- glmnet(x = X, y = y, alpha = 1)

The glmnet function returns an object of class “glmnet”, which contains information about the fitted model, such as the coefficients, the lambda values, the degrees of freedom, etc. We can inspect the lasso_model object using the print or summary functions:

print(lasso_model)
summary(lasso_model)

Call:  glmnet(x = X, y = y, alpha = 1)

Df

%Dev

Lambda

Df

%Dev

Lambda

1

0

0

5.147

41

9

86.27

0.1246

2

2

12.9

4.69

42

9

86.32

0.1135

3

2

24.81

4.273

43

9

86.36

0.1034

4

2

34.69

3.894

44

9

86.39

0.0942

5

2

42.9

3.548

45

9

86.42

0.0859

6

2

49.71

3.232

46

9

86.44

0.0782

7

2

55.37

2.945

47

9

86.46

0.0713

8

2

60.06

2.684

48

9

86.48

0.0649

9

2

63.96

2.445

49

9

86.49

0.0592

10

3

67.26

2.228

50

9

86.5

0.0539

11

3

70.15

2.03

51

9

86.51

0.0491

12

3

72.56

1.85

52

9

86.52

0.0448

13

3

74.55

1.685

53

9

86.52

0.0408

14

3

76.21

1.536

54

10

86.54

0.0372

15

3

77.59

1.399

55

10

86.6

0.0339

16

3

78.73

1.275

56

10

86.65

0.0309

17

3

79.68

1.162

57

10

86.69

0.0281

18

3

80.46

1.058

58

10

86.73

0.0256

19

3

81.12

0.9645

59

10

86.76

0.0233

20

3

81.66

0.8788

60

10

86.78

0.0213

21

3

82.11

0.8007

61

10

86.8

0.0194

22

3

82.49

0.7296

62

10

86.82

0.0177

23

4

82.81

0.6648

63

10

86.83

0.0161

24

5

83.2

0.6057

64

10

86.84

0.0147

25

5

83.6

0.5519

65

10

86.85

0.0134

26

6

83.96

0.5029

66

10

86.86

0.0122

27

6

84.26

0.4582

67

10

86.87

0.0111

28

6

84.51

0.4175

68

10

86.87

0.0101

29

6

84.72

0.3804

69

10

86.88

0.0092

30

8

84.89

0.3466

70

10

86.88

0.0084

31

8

85.14

0.3158

71

10

86.88

0.0076

32

8

85.35

0.2878

72

10

86.89

0.007

33

8

85.53

0.2622

73

10

86.89

0.0063

34

8

85.68

0.2389

74

10

86.89

0.0058

35

8

85.8

0.2177

75

10

86.89

0.0053

36

8

85.9

0.1983

76

10

86.89

0.0048

37

8

85.98

0.1807

77

10

86.89

0.0044

38

9

86.06

0.1647

78

10

86.9

0.004

39

9

86.15

0.15

79

10

86.9

0.0036

40

9

86.22

0.1367


Length

Class

Mode

a0

79

-none-

numeric

beta

869

dgCMatrix

S4

df

79

-none-

numeric

dim

2

-none-

numeric

lambda

79

-none-

numeric

dev.ratio

79

-none-

numeric

nulldev

1

-none-

numeric

npasses

1

-none-

numeric

jerr

1

-none-

numeric

offset

1

-none-

logical

call

4

-none-

call

nobs

1

-none-

numeric

The print function shows the dimensions of the coefficient matrix, the number of non-zero coefficients, and the range of lambda values. The summary function shows more details, such as the coefficients' values, the number of non-zero coefficients for each lambda value, and the deviance explained for each lambda value.

We can also visualize the lasso_model object using the plot function, which plots the coefficients against the log-lambda values. The plot function can take several arguments, such as xvar, which specifies what to plot on the x-axis. 

We can set xvar to “lambda” to plot the coefficients against the lambda values, or to “dev” to plot the coefficients against the percent deviance explained. We can also use the label argument to label the coefficients by variable names. We can plot the lasso_model object as follows:

plot(lasso_model, xvar = "lambda", label = TRUE)

Visualization of Lasso Regression in R

The plot shows how the coefficients change as we increase or decrease the lambda value. We can see that as we increase lambda (move from right to left), more and more coefficients are shrunk to zero, thus performing variable selection. 

We can also see that some of the coefficients have different signs depending on the lambda value, which indicates that they have different effects on the target variable under different levels of regularization.

How to Compare Lasso Regression with Ridge Regression and Elastic Net?

Lasso regression is not the only type of regularization technique that we can use to fit linear models. Another popular technique is ridge regression, which adds a penalty term to the loss function proportional to the sum of the squares of the coefficients. This penalty term is also known as the L2 norm of the coefficients. The ridge regression model can be written as:

Compare Lasso Regression with Ridge Regression and Elastic Net

Ridge regression has some advantages and disadvantages compared to lasso regression:

  • Ridge regression does not perform variable selection but shrinks all coefficients by the same factor. This can help reduce multicollinearity and improve stability but also make interpretation more difficult.
  • Ridge regression tends to have a lower bias but higher variance than lasso regression, which means it can fit the data better and overfit more easily.

We can fit a ridge regression model using the glmnet package by setting alpha to 0 in the glmnet function. For example, we can fit a ridge regression model on the same data as before using the following code:

set.seed(123) # set seed for reproducibility
ridge_model <- glmnet(x = X, y = y, alpha = 0)

Using the print, summary, or plot functions, we can compare the ridge_model object with the lasso_model object. For example, we can plot both models on the same graph using the following code:

plot(lasso_model, col = "blue", label = TRUE)
plot(ridge_model, col = "red", add = TRUE)
legend("topright", legend = c("Lasso", "Ridge"), col = c("blue", "red"), lty = 1)

Compare Lasso Regression with Ridge Regression
The plot shows how both models behave differently as we change the lambda value. We can see that ridge regression shrinks all of the coefficients towards zero, but does not set any of them to exactly zero. On the other hand, lasso regression sets some of the coefficients to exactly zero, thus performing variable selection.

Another type of regularization technique that combines both lasso and ridge regression is elastic net, which adds a penalty term to the loss function, a weighted average of the L1 and L2 norms of the coefficients. 

plot(cv_lasso, xvar = "lambda", label = TRUE)

cv_lasso regression in R

The plot shows how the MSE changes as we vary the lambda value. We can see that the optimal lambda value (marked by a vertical dotted line) is the one that minimizes the MSE. We can also see that the lambda.1se value (marked by a vertical dashed line) is slightly larger than the optimal lambda value but has lower complexity (fewer degrees of freedom).

We can extract the optimal lambda value and the corresponding coefficients from the cv_lasso object using the coef function. The coef function takes an argument called s, which specifies the value of lambda for which we want to extract the coefficients. 

We can set s to “lambda.min” to get the coefficients for the optimal lambda value, or to “lambda.1se” to get the coefficients for the lambda.1se value. We can also set s to any numeric lambda value within the range of values used by cv.glmnet. 

We can extract the coefficients for the optimal lambda value as follows:

coef(cv_lasso, s = "lambda.min")

The coef function returns a sparse matrix of coefficients where most elements are zero. We can see that only four variables have non-zero coefficients: cyl, hp, wt, and qsec. This means that these are the only variables selected by lasso regression for the optimal lambda value.

We can also use the predict function to make predictions using the cv_lasso object. The predict function takes an argument called newx, a matrix of new predictor values for which we want to make predictions. It also takes an argument called s, which specifies the value of lambda for which we want to make predictions. We can set s to “lambda.min” to make predictions using the optimal lambda value or to “lambda.1se” to make predictions using the lambda.1se value. We can also set s to any numeric lambda value within the range of values used by cv.glmnet. We can make predictions for the same data as before using the following code:

pred_lasso <- predict(cv_lasso, newx = X, s = "lambda.min")

The predict function returns a vector of predicted values for the target variable (mpg). We can compare these predictions with the actual values using performance metrics, such as mean squared error (MSE), root mean squared error (RMSE), or R-squared. For example, we can compute the MSE and R-squared for our predictions as follows:

mse_lasso <- mean((y - pred_lasso)^2)
rsq_lasso <- 1 - mse_lasso / var(y)

Our lasso regression model has an MSE of 6.29 and an R-squared of 0.83 for the optimal lambda value.

We can repeat the same steps for ridge regression and elastic net models using cv.glmnet with different alpha values. For example, we can perform cross-validation for ridge regression using alpha = 0 and elastic net using alpha = 0.5 as follows:

set.seed(123) # set seed for reproducibility
cv_ridge <- cv.glmnet(x = X, y = y, alpha = 0)
cv_enet <- cv.glmnet(x = X, y = y, alpha = 0.5)

We can compare the cross-validation results for all three models using print, summary, or plot functions. For example, we can plot all three models on the same graph using the following code:

plot(cv_lasso$lambda, cv_lasso$cvm, type = "b", col = "blue", xlab = "Log(Lambda)", ylab = "Mean Squared Error", main = "Cross-Validation Results")
points(cv_ridge$lambda, cv_ridge$cvm, type = "b", col = "red")
points(cv_enet$lambda, cv_enet$cvm, type = "b", col = "green")
legend("topright", legend = c("Lasso", "Ridge", "Elastic Net"), col = c("blue", "red", "green"), pch = 1)

Compare Lasso Regression with Ridge Regression and Elastic Net
The plot shows how the mean squared error changes as we vary the log-lambda value for each model. We can see that lasso regression has the lowest mean squared error among all three models for most values of log-lambda. 

We can also see that ridge regression has a higher mean squared error than lasso regression and elastic net for small values of log-lambda, but a lower mean squared error than elastic net for large values of log-lambda.

Conclusion

In this blog post, you learned how to perform lasso regression in R using the glmnet package. You also learned how to compare lasso regression with ridge regression and elastic net, and how to select the optimal tuning parameter using cross-validation. Here are some key points to remember:

  • Lasso regression is a type of linear regression that adds a penalty term to the loss function that is proportional to the sum of the absolute values of the coefficients. This penalty term is also known as the L1 norm of the coefficients.
  • Lasso regression can perform variable selection by shrinking some of the coefficients to exactly zero, thus removing some predictors from the model. This can help reduce overfitting and improve interpretability.
  • Lasso regression can handle multicollinearity by assigning similar coefficients to correlated predictors, thus reducing their individual influence on the model.
  • To perform lasso regression, we can use the glmnet package, which provides functions for fitting generalized linear models with L1 and L2 regularization. The main function is glmnet, which takes a matrix of predictor values (x) and a vector of target values (y) as arguments, and returns an object of class “glmnet”, which contains information about the fitted model. We can set alpha to 1 in the glmnet function to perform lasso regression.
  • To select the optimal value of the tuning parameter (lambda) that minimizes the prediction error, we can use cross-validation, which is a technique that splits the data into several subsets (folds), trains the model on some of the subsets (training set), and evaluates the model on the remaining subsets (validation set). The glmnet package provides a function called cv.glmnet, which performs cross-validation for glmnet models. The cv.glmnet function returns an object of class “cv.glmnet”, which contains information about the cross-validation results, such as the optimal lambda value and the corresponding coefficients.
  • To compare lasso regression with ridge regression and elastic net, we can use different alpha values in the glmnet and cv.glmnet functions. Alpha can take values between 0 and 1, where 0 corresponds to ridge regression (L2 penalty), 1 corresponds to lasso regression (L1 penalty), and any value in between corresponds to elastic net (a combination of L1 and L2 penalties). We can use print, summary, or plot functions to inspect and visualize the results for each model.

If you are interested in learning more about data science and machine learning, or if you need help with your data analysis projects, you can contact us at info@rstudiodatalab.com or visit our website at https://www.rstudiodatalab.com/p/order-now.html. 

We are a team of experienced and professional data scientists who can provide you with high-quality and customized solutions for your data needs. We can help you with data collection, data cleaning, data visualization, data modeling, data interpretation, and data communication. 

We can also help you write, rewrite, improve, or optimize your content. Whether you need a blog post, a report, a presentation, or a code, we can deliver it to you promptly and efficiently. We look forward to hearing from you and working with you on your data science projects. 

Frequently Asked Questions (FAQs)

What is Lasso Regression?

Lasso Regression is a method used in statistics and machine learning for variable selection and regularization. It is a form of linear regression that adds a penalty term to the ordinary least squares (OLS) objective function, resulting in sparse coefficient estimates.

How does Lasso Regression differ from Linear Regression?

Lasso Regression differs from Linear Regression by including a regularization term that shrinks the coefficient estimates towards zero. This helps in feature selection and avoids overfitting by penalizing the model for including unnecessary variables.

What is the purpose of regularization in Lasso Regression?

Regularization in Lasso Regression aims to prevent overfitting and improve model accuracy. Regularization adds a penalty term to the OLS objective function, forcing the model to select only the most relevant features and reducing the impact of irrelevant or noisy variables.

What is the difference between Lasso Regression and Ridge Regression?

Lasso Regression and Ridge Regression are both regularization techniques used in linear regression. The main difference is in the penalty term used: Lasso adds the absolute value of the coefficients, while Ridge adds the square of the coefficients. This leads to different selection behaviors, with Lasso tending to produce sparse solutions by setting some coefficients to zero.

How can I perform Lasso Regression in R?

To perform Lasso Regression in R, you can use the "glmnet" package. This package provides functions for fitting the Lasso model on the training data, selecting the optimal lambda coefficient, and making predictions on a test set.

What is the significance of the lambda coefficient in Lasso Regression?

The lambda coefficient in Lasso Regression controls the amount of regularization applied to the model. A smaller lambda value will result in less regularization, allowing more variables to be included in the model. A larger lambda value will increase the amount of regularization, leading to sparser solutions with fewer variables.

How do I select the optimal lambda value in Lasso Regression?

The optimal lambda value in Lasso Regression can be selected using cross-validation. By fitting the Lasso model with different lambda values and evaluating the performance on a validation set, you can choose the lambda value that minimizes the mean squared error or another appropriate metric.

What are the advantages of using Lasso Regression?

Lasso Regression has several advantages: - It performs feature selection by automatically setting some coefficients to zero. - It can handle high-dimensional data with a large number of features. - It reduces the risk of overfitting by penalizing unnecessary variables. - It can handle collinearity by shrinking the coefficient estimates towards zero.

Can Lasso Regression be used for non-linear regression?

Lasso Regression is primarily designed for linear regression problems. However, it can be extended to handle non-linear regression by including appropriate non-linear transformations of the features in the model.

How can I interpret the coefficient estimates in Lasso Regression?

The coefficient estimates in Lasso Regression represent the relationship between each predictor variable and the response variable. A positive coefficient indicates a positive relationship, while a negative coefficient indicates a negative relationship. The magnitude of the coefficient represents the strength of the relationship. Note that some coefficients may be set to zero due to the regularization, indicating that the corresponding features have been excluded from the final model.

What is the difference between lasso and ridge regression?

Lasso and ridge regression are both types of regularized linear regression that add a penalty term to the loss function. The difference is that lasso uses the L1 norm of the coefficients as the penalty term, which shrinks some of the coefficients to exactly zero, thus performing variable selection. Ridge uses the L2 norm of the coefficients as the penalty term, which shrinks all of the coefficients by the same factor, but does not set any of them to zero.

What is the advantage of elastic net over lasso and ridge regression?

Elastic net is a regularized linear regression that combines lasso and ridge penalties. The advantage of elastic net is that it can handle correlated predictors better than lasso by grouping them together like ridge. It can also perform variable selection like lasso, but with a lower complexity than ridge.

How to choose the optimal value of lambda for lasso regression?

One way to choose the optimal value of lambda for lasso regression is to use cross-validation, which is a technique that splits the data into several subsets (folds), trains the model on some of the subsets (training set), and evaluates the model on the remaining subsets (validation set). The optimal value of lambda is then chosen as the one that minimizes the average prediction error across all folds.

How to interpret the coefficients of lasso regression?

The coefficients of lasso regression represent the effect of each predictor variable on the target variable, holding all other variables constant. The sign of the coefficient indicates whether the effect is positive or negative, and the magnitude of the coefficient indicates how strong the effect is. The coefficients shrunk to zero, indicating that the corresponding variables are not selected by lasso regression and have no effect on the target variable.

How to check the assumptions of lasso regression?

The assumptions of lasso regression are similar to those of ordinary linear regression, such as linearity, independence, homoscedasticity, and normality. To check these assumptions, we can use various diagnostic tools, such as residual plots, Q-Q plots, VIFs, and tests for autocorrelation and heteroscedasticity.

How do we compare lasso regression with other machine learning models?

To compare lasso regression with other machine learning models, we can use various performance metrics, such as mean squared error (MSE), root mean squared error (RMSE), R-squared, mean absolute error (MAE), or mean absolute percentage error (MAPE). We can also use cross-validation or hold-out validation to estimate the generalization error of each model on new data.

How to handle categorical variables in lasso regression?

To handle categorical variables in lasso regression, we can use dummy coding or one-hot encoding to convert them into binary variables. For example, if a categorical variable has k levels, we can create k-1 binary variables that indicate whether each observation belongs to each level. Alternatively, we can use contrast or effect coding to create k-1 binary variables that compare each level with a reference level or the overall mean.

How do we handle missing values in lasso regression?

We can use various imputation methods to handle missing values in lasso regression, such as mean imputation, median imputation, mode imputation, k-nearest neighbors imputation, or multiple imputation. Imputation methods replace missing values with plausible ones based on criteria or algorithms. Alternatively, we can use listwise or pairwise deletion to remove the observations or variables containing missing values.

How do we handle outliers in lasso regression?

We can use various methods to handle outliers in lasso regression, such as winsorizing, trimming, robust regression, or transformation. Winsorizing and trimming methods replace or remove the extreme values beyond a certain threshold. Robust regression methods use different loss functions or weighting schemes less sensitive to outliers. Transformation methods apply some mathematical functions to reduce the skewness or variance of the data.

How we improve the performance of lasso regression?

To improve the performance of lasso regression, we can use various methods, such as feature engineering, feature selection, hyperparameter tuning, or ensemble methods. Feature engineering methods create new or transform existing features to improve their relevance or quality. Feature selection methods reduce the number of features by selecting the most important or relevant ones. Hyperparameter tuning methods optimize the values of the parameters that control the model behavior, such as alpha and lambda. Ensemble methods combine multiple models to improve the accuracy and robustness of the predictions.

Join Our Community   Allow us to Assist You 

About the author

Zubair Goraya
Ph.D. Scholar | Certified Data Analyst | Blogger | Completed 5000+ data projects | Passionate about unravelling insights through data.
-->

Post a Comment

Have A Question?We will reply within minutes
Hello, how can we help you?
Start chat...