What is Collinearity?
Collinearity is a phenomenon in statistics where two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a substantial degree of accuracy. In other words, it occurs when two or more predictor variables in a model are highly correlated, which can create unreliable and unstable estimates of regression coefficients.
Examples of Collinearity
Collinearity can occur in both categorical and continuous variables. Examples of collinearity between categorical variables include when two variables represent the same concept, such as gender (male vs. female) and sex (male vs. female). Examples of collinearity between continuous variables include when two variables measure the same thing, such as height and weight.
Effects of Collinearity
Collinearity can have several effects on regression models, including:
- Reducing the coefficient estimates of the variables in the model.
- Increasing the standard errors of the estimates, which can lead to incorrect inferences.
- Making it difficult to determine which of the predictor variables are contributing to the model.
In addition, collinearity can lead to multicollinearity, which occurs when three or more predictor variables in a model are highly correlated. Multicollinearity can often lead to inflated standard errors and incorrect inferences.
Conclusion
Collinearity can have a significant impact on the results of multiple regression models, and should be avoided when possible. In cases where collinearity is unavoidable, it is important to understand the effects that it can have on the model and to take appropriate steps to mitigate its effects. For more information, please see: