multicollinearity
A researcher examines a scatterplot matrix showing multicollinearity among variables.
Noun: A statistical phenomenon in which two or more independent variables (predictors) in a multiple regression model are highly linearly correlated with each other. This high intercorrelation makes it difficult to determine the individual effect of each predictor on the dependent variable, leading to unreliable and unstable estimates of the regression coefficients.
Multicollinearity is a technical term used primarily in the fields of statistics, econometrics, and data science. It describes a problem within a specific type of data analysis. * It is used to diagnose issues in a regression model. * It is something that researchers and analysts test for and try to mitigate.
- Noun:
- The high
multicollinearitybetween advertising spend and social media presence made it impossible to tell which one truly drove sales. - A variance inflation factor (VIF) is commonly used to detect
multicollinearity. - The model's results were questionable due to severe
multicollinearityamong the input variables.
- "Perfect multicollinearity": An extreme case where one independent variable is an exact linear combination of others (e.g., including both "height in inches" and "height in centimeters"). This prevents the regression calculation from being performed at all.
- "Multicollinearity inflates the standard errors": A key consequence. High multicollinearity does not bias the coefficient estimates but makes their estimated standard errors very large, reducing the statistical significance of the predictors.
- Collinearity (noun): A simpler form, often used when discussing the linear relationship between just two predictor variables. Multicollinearity is the more general term for multiple variables.
- Multicollinear (adjective): Describing the state of the variables.
- The predictors were highly
multicollinear.
- Intercorrelation (noun): A more general term for correlation among variables, not specific to regression analysis.
- Linearly dependent variables (noun phrase): A mathematical description of the condition causing multicollinearity.
Given the technical nature of this term, it is associated with specific diagnostic and remedial concepts rather than phrasal verbs or idioms. * Variance Inflation Factor (VIF): A measure used to quantify the severity of multicollinearity. * Condition Index / Condition Number: Another statistical measure for detecting multicollinearity. * Remedial measures: Actions to address multicollinearity, such as removing correlated variables, combining them into an index, or using regularization techniques like Ridge Regression.
A researcher examines a scatterplot matrix showing multicollinearity among variables.
- a case of multiple regression in which the predictor variables are themselves highly correlated