Multiple Linear Regression Model
Multiple linear regression is a statistical technique used to examine the linear relationships between a response variable and several explanatory variables (also known as independent or predictor variables). Multiple linear regression analysis can be used to determine the relationships between responses and explanatory variables, such as whether one or more of the explanatory variables have a significant influence on the response. If multiple linear regression analysis is conducted correctly, it can be a powerful tool in predicting the response to changes in the explanatory variables.
Multiple linear regression uses the same basic equation as simple linear regression, but with an addition of one or more variables. The equation for multiple linear regression is given below:
Y = b0 + b1x1 + b2x2 + b3x3 + … + bpxp,
where
Y is the response variable,
b0 is the intercept,
b1 through bp are the regression coefficients for the corresponding explanatory variables,
x1 through xp are the explanatory or independent variables.
The goal of multiple linear regression is to determine the regression coefficients that will best predict the response variable Y. The regression coefficients can be determined by first estimating the coefficients values and then determining their significance. The method of estimation could be by the least squares method, maximum likelihood methods, or other numerical optimization techniques.
The least squares method finds the regression coefficients that minimize the sum of the squares of the residuals (the errors) between the observed and predicted values of the response variable. Once significance is determined by testing the hypothesis that the coefficients are not equal to zero, the multiple linear regression model can be used to identify which explanatory variables are the most important predictors of the response variable.
The assumptions for multiple linear regression include no multicollinearity (the predictors are not highly correlated) and no autocorrelation (the residuals are not correlated). If the model does not satisfy these assumptions, the results might not be reliable and must be interpreted carefully.
Once the multiple linear regression model is fitted, it can be used to predict the response from new values of the explanatory variables. The predicted response is called the fitted value and is simply the result of the linear equations that have been calculated using the estimated regression coefficients (the predictors multipled by the corresponding coefficients and then added together). The difference between the fitted value of the response and the actual value of the response is called an error.
The multiple linear regression model can also be used to explore the relationships between variables, although care should be taken to ensure that the analysis is being used appropriately and that the results are valid. For example, correlations can be calculated among the predictors to check for multicollinearity and between the residuals and the predictors to check for autocorrelation.
Multiple linear regression models can provide valuable insights into the relationships between explanatory and response variables, so long as the assumptions of the model are met. By correctly fitting the model, the regression coefficients can be used to determine which variables influence the response and how much influence they have. Additionally, the model can be used to predict the response to new values of the explanatory variables.