{"id":3731,"date":"2019-12-31T14:42:41","date_gmt":"2019-12-31T14:42:41","guid":{"rendered":"https:\/\/analystprep.com\/study-notes\/?p=3731"},"modified":"2026-02-27T16:01:27","modified_gmt":"2026-02-27T16:01:27","slug":"regression-diagnostics","status":"publish","type":"post","link":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/","title":{"rendered":"Regression Diagnostics"},"content":{"rendered":"<p><iframe loading=\"lazy\" src=\"\/\/www.youtube.com\/embed\/C1tgpoyD7Yw\" width=\"611\" height=\"343\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p><b>After completing this reading, you should be able to:<\/b><\/p>\n<p><script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"QAPage\",\n  \"mainEntity\": {\n    \"@type\": \"Question\",\n    \"name\": \"Which of the following statements is\/are correct?\",\n    \"acceptedAnswer\": {\n      \"@type\": \"Answer\",\n      \"text\": \"The correct answer is C.\\n\\nAll statements are correct. \\n\\nIf the variance of the residuals is constant across all observations in the sample, the regression is said to be homoskedastic. When the opposite is true, the regression is said to exhibit heteroskedasticity, i.e., the variance of the residuals is not the same across all observations in the sample. The presence of conditional heteroskedasticity poses a significant problem: it introduces a bias into the estimators of the standard error of the regression coefficients. As such, it understates the standard error.\"\n    },\n    \"suggestedAnswer\": [\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"A. Only I\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"B. II and III\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"C. All statements are correct\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"D. None of the statements are correct\"\n      }\n    ],\n    \"answerCount\": 4\n  }\n}\n<\/script> <script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"QAPage\",\n  \"mainEntity\": {\n    \"@type\": \"Question\",\n    \"name\": \"A financial analyst fails to include a variable which inherently has a non-zero coefficient in his regression analysis. Moreover, the ignored variable is highly correlated with the remaining variables. What is the most likely deficiency of the analyst\u2019s model?\",\n    \"acceptedAnswer\": {\n      \"@type\": \"Answer\",\n      \"text\": \"The correct answer is A.\\n\\nOmitted variable bias occurs under two conditions:\\n\\nI. A variable with a non-zero coefficient is omitted.\\n\\nII. A variable that is omitted is correlated with remaining (included) variables.\\n\\nThese conditions are met in the description of the analyst\u2019s model.\"\n    },\n    \"suggestedAnswer\": [\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"A. Omitted variable bias.\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"B. Bias due to inclusion of extraneous variables.\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"C. Presence of heteroskedasticity.\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"D. None of the above.\"\n      }\n    ],\n    \"answerCount\": 4\n  }\n}\n<\/script> <script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"ImageObject\",\n  \"url\": \"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182.jpg\",\n  \"caption\": \"Outliers\",\n  \"width\": 1373,\n  \"height\": 1041,\n  \"copyrightNotice\": \"\u00a9 2024 AnalystPrep\",\n  \"acquireLicensePage\": \"https:\/\/analystprep.com\/license-info\",\n  \"creditText\": \"AnalystPrep Design Team\",\n  \"creator\": {\n    \"@type\": \"Organization\",\n    \"name\": \"AnalystPrep\"\n  }\n}\n<\/script> <script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"ImageObject\",\n  \"url\": \"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_12.jpg\",\n  \"caption\": \"Multicollinearity\",\n  \"width\": 1373,\n  \"height\": 1041,\n  \"copyrightNotice\": \"\u00a9 2024 AnalystPrep\",\n  \"acquireLicensePage\": \"https:\/\/analystprep.com\/license-info\",\n  \"creditText\": \"AnalystPrep Design Team\",\n  \"creator\": {\n    \"@type\": \"Organization\",\n    \"name\": \"AnalystPrep\"\n  }\n}\n<\/script> <script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"ImageObject\",\n  \"url\": \"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/02\/Img_1-1536x782.jpg\",\n  \"caption\": \"Homoskedasticity Vs.Heteroskedasticity\",\n  \"width\": 1536,\n  \"height\": 792,\n  \"copyrightNotice\": \"\u00a9 2024 AnalystPrep\",\n  \"acquireLicensePage\": \"https:\/\/analystprep.com\/license-info\",\n  \"creditText\": \"AnalystPrep Design Team\",\n  \"creator\": {\n    \"@type\": \"Organization\",\n    \"name\": \"AnalystPrep\"\n  }\n}\n<\/script><\/p>\n<ul>\n<li>Explain how to test whether regression is affected by heteroskedasticity.<\/li>\n<li>Describe approaches to using heteroskedastic data.<\/li>\n<li>Characterize multicollinearity and its consequences; distinguish between multicollinearity and perfect collinearity.<\/li>\n<li>Describe the consequences of excluding a relevant explanatory variable from a model and contrast those with the consequences of including an irrelevant regressor.<\/li>\n<li>Explain two model selection procedures and how these relate to the bias-variance tradeoff.<\/li>\n<li>Describe the various methods of visualizing residuals and their relative strengths.<\/li>\n<li>Describe methods for identifying outliers and their impact.<\/li>\n<li>Determine the conditions under which OLS is the best linear unbiased estimator.<\/li>\n<\/ul>\n<h2>Regression Model Specifications<\/h2>\n<p>Model specification is a process of determining which independent variables should be included in or excluded from a regression model.<\/p>\n<p>That is, an ideal regression model should consist of all the variables that explain the dependent variables and remove those that do not.<\/p>\n<p>Model specification includes the residual diagnostics and the statistical tests on the assumptions of OLS estimators. Basically, the choice of variables to be included in a model depends on the bias-variance tradeoff. For instance, large models that include the relevant number of variables are likely to have unbiased coefficients. On the other side, smaller models lead to accurate estimates of the impact of removing some variables.<\/p>\n<p>The conventional specification makes sure that the functional form of the model is adequate, the parameters are constant, and the homoscedasticity assumption is met.<\/p>\n<div style=\"background: #f3f4f6; padding: 16px 14px; border-radius: 12px; margin: 20px 0; text-align: center;\"><a style=\"display: inline-flex; align-items: center; justify-content: center; padding: 12px 18px; border: 2px solid #1d4ed8; border-radius: 999px; color: #1d4ed8; text-decoration: none; font-weight: 600; font-size: 16px; line-height: 1; background: #ffffff; white-space: nowrap;\" href=\"https:\/\/analystprep.com\/free-trial\/\" target=\"_blank\" rel=\"noopener noreferrer\"> Evaluate regression assumptions with FRM-level practice <\/a><\/div>\n<h2>The Omitted Variables<\/h2>\n<p>An omitted variable is one with a non-zero coefficient, but they are excluded in the regression model.<\/p>\n<h3>Effects of Omitting Variables<\/h3>\n<ol type=\"I\">\n<li>The remaining variables sustain the impact of the excluded variables in terms of the common variation. Thus, they do not consistently approximate the change in the independent variable on the dependent variable while keeping all other things constant.<\/li>\n<li>The magnitude of the estimated residuals is larger than the true value. This is true since the estimated residuals have the true value and the effect of the omitted value that cannot be reflected in the included variables.<\/li>\n<\/ol>\n<h3>Illustration of the Omitted Variables<\/h3>\n<p>Suppose that the regression model is stated as:<\/p>\n<p>$$ {\\text Y}_{\\text i}=\\alpha+\\beta_{\\text i} {\\text X}_{1{\\text i}}+\\beta_2 {\\text X}_{2{\\text i}}+\\epsilon_{\\text i} $$<\/p>\n<p>If we omit \\({\\text X}_{2}\\) from the estimated model, then the model is given by:<\/p>\n<p>$$ {\\text Y}_{\\text i}=\\alpha+\\beta_{\\text i} {\\text X}_{1{\\text i}}+\\epsilon_{\\text i} $$<\/p>\n<p>Now, in large samples sizes, the OLS estimator \\({\\hat \\beta}_1\\) converges to:<\/p>\n<p>$$ \\beta_1+\\beta_2 \\delta $$<\/p>\n<p>Where:<\/p>\n<p>$$ \\delta=\\cfrac {{\\text {Cov}}({\\text X}_1,{\\text X}_2)}{{\\text {Var}}({\\text X}_1)} $$<\/p>\n<p>\\(\\delta\\) is the population slope coefficient in a regression of \\({\\text X}_2\\) on \\({\\text X}_1\\).<\/p>\n<p>It is clear that the bias\u00a0\u2013 due to the omitted variable \u2013 depends on the population coefficient of the excluded variable \\(\\beta_2\\) and the relational strength of the \\({\\text X}_2\\) and \\({\\text X}_1\\), represented by \\(\\delta\\).<\/p>\n<p>When the correlation between \\({\\text X}_1\\) and \\({\\text X}_2\\) is high, \\({\\text X}_1\\) can explain a significant proportion of variation in \\({\\text X}_2\\) and hence the bias is high. On the other hand, if the independent variables are uncorrelated, that is \\(\\delta=0\\) then \\(\\hat \\beta_1\\) is a consistent estimator of \\(\\beta_1\\).<\/p>\n<p>Conclusively, the omitted variable leads to biasness of the coefficient on the variables that are correlated with the omitted variables.<\/p>\n<h2>Inclusion of Extraneous Variables<\/h2>\n<p>An extraneous variable is one that is unnecessarily included in the model, whose actual coefficient is 0 and is consistently estimated to be 0 in large samples. If we include these variables is costly.<\/p>\n<h4>Illustration of Effect of Inclusion of Extraneous Random Variables<\/h4>\n<p>Recall that the adjusted \\({\\text R}^2\\) is given by:<\/p>\n<p>$$ {{\\bar {\\text R}}^2}=1-\\xi {\\cfrac {\\text {RSS}}{\\text {TSS}}} $$<\/p>\n<p>Where:<\/p>\n<p>$$ \\xi=\\cfrac {({\\text n}-1)}{({\\text n}-{\\text k}-1)} $$<\/p>\n<p>Looking at the formula above, adding more variables increase the value of k which in turn increases the value of \\(\\xi\\) and hence reducing the value of \\({{\\bar {\\text R}}^2}\\). However, if the model is large, then RSS is smaller which reduces the effect of \\(\\xi\\) and produces larger \\({{\\bar {\\text R}}^2}\\).<\/p>\n<p>Contrastingly, this is not always the case when the true coefficient is equal to 0 because, in this case, RSS remains constant as \\(\\xi\\) increases leading to a smaller \\({{\\bar {\\text R}}^2}\\) and a large standard error.<\/p>\n<p>Lastly, if the correlation between \\({\\text X}_1\\) and \\({\\text X}_2\\) increases, the standard error value rises.<\/p>\n<h2>The Bias-Variance Tradeoff<\/h2>\n<p>The bias-variance tradeoff amounts to choosing between the including irrelevant variables and excluding relevant variables. Bigger models tend to have low bias level because it includes more relevant variables. However, they are less accurate in approximating the regression parameters due to the possibility of involving extraneous variables.<\/p>\n<p>Moreover, regression models with fewer independent variables are characterized by low estimation error but more prone to biased parameter estimates.<\/p>\n<h2>Methods of Choosing a Model from a Set of Independent Variables<\/h2>\n<ol type=\"1\">\n<li><b>General-to-Specific Model Selection<\/b>\n<p>In the general-to-specific method, we start with a large general model that incorporates all the relevant variables. Then, the reduction of the general model starts. We use hypothesis tests to establish if there are any statistically insignificant coefficients in the estimated model. \u00a0When such coefficients are found, the variable with the coefficient with the smallest t-statistic is removed. The model is then re-estimated using the remaining set of independent variables. Once more, hypothesis tests are carried out to establish if statistically insignificant coefficients are present. These two steps (remove and re-estimate) are repeated until all coefficients that are statistically insignificant have been removed.<\/p>\n<\/li>\n<li><b>m-fold Cross-Validation<\/b>\n<p>The m-fold cross-validation model-selection method aims at choosing the model that\u2019s best at fitting observations not used to estimate parameters.<\/p>\n<p><strong>How is this method executed?<\/strong><\/p>\n<p>As a first step, the number of models has to be decided, and this is determined in part by the number of explanatory variables. When this number is small, the researcher can consider all the possible combinations. With 10 variables, for example, 1,024 (=) distinct models can be constructed.<\/p>\n<p>The cross-validation process proceeds as follows:<\/p>\n<ol>\n<li>Shuffle the dataset randomly.<\/li>\n<li>Split the dataset into <em>m<\/em> groups.<\/li>\n<li>Estimate parameters using m-1 of the groups; these groups make up what we call the <strong>training block<\/strong>. The excluded group is referred to as the <strong>validation block.<\/strong><\/li>\n<li>Use the estimated parameters and the data in the excluded block (validation block) to compute residual values. These residuals are referred to as out-of-sample residuals since they are arrived at using data not included in the sample used to come up with the parameter estimates.<\/li>\n<li>Repeat parameter estimation and residual computation a total of <em>m<\/em> times; each group has to serve as the validation block and used to compute residuals.<\/li>\n<li>Compute the sum of squared errors using the residuals estimated from the out-of-sample data.<\/li>\n<li>Select the model with the smallest out-of-sample sum of squared residuals.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2>Heteroskedasticity<\/h2>\n<p>Recall that homoskedasticity is one of the critical assumptions in the determination of the distribution of the OLS estimator. That is, the variance of \\(\\epsilon_i\\) is constant and that it does not vary with any of the independent variables; formally stated as \\(\\text{Var}(\\epsilon_i\u2502{\\text X}_{1{\\text i}},{\\text X}_{2{\\text i}},\u2026,{\\text X}_{k{\\text i}} )=\\delta^2\\). <strong>Heteroskedasticity<\/strong> is a <strong>systematic<\/strong> pattern in the residuals where the variances of the residuals are <strong>not constant<\/strong>.<\/p>\n<h3><img decoding=\"async\" class=\"aligncenter wp-image-22797 size-full\" style=\"max-width: 100%;\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/02\/Img_1-1536x782.jpg\" alt=\"\" \/><\/h3>\n<h3>Test for Heteroskedasticity<\/h3>\n<p>Halbert White proposed a simple test, with the following two-step procedures:<\/p>\n<ol type=\"I\">\n<li>Approximate the model and calculate the residuals, \\(\\epsilon_{\\text i}\\)<\/li>\n<li>Regress the <b>squared<\/b> residuals on:\n<ol type=\"1\">\n<li>A constant<\/li>\n<li>All explanatory variables<\/li>\n<li>The cross product of all the independent variables, including the product of each variable with itself.<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>Consider an original model with two independent variables:<\/p>\n<p>$$ {\\text Y}_{\\text i}=\\alpha+\\beta_{\\text i} {\\text X}_{1{\\text i}}+\\beta_2 {\\text X}_{2{\\text i}}+\\epsilon_{\\text i} $$<\/p>\n<p>The first step is to calculate the residuals by utilizing the OLS parameter estimators:<\/p>\n<p>$$ {\\hat \\epsilon}_{\\text i}={\\text Y}_{\\text i}-{\\hat {\\alpha} }-{\\hat {\\beta}}_1 {\\text X}_{1{\\text i}}-{\\hat \\beta}_2 {\\text X}_{2{\\text i}} $$<\/p>\n<p>Now, we need to regress the squared residuals on a constant \\({\\text X}_1,{\\text X}_2,{\\text X}_1^2,{\\text X}_2^2\\) and \\({\\text X}_1 {\\text X}_2\\)<\/p>\n<p>$$ {\\hat \\epsilon}_{\\text i}^2={\\Upsilon}_0+{\\Upsilon}_1 {\\text X}_{1{\\text i}}+{\\Upsilon}_2 {\\text X}_{2{\\text i}}+{\\Upsilon}_3 {\\text X}_{1{\\text i}}^2+{\\Upsilon}_4 {\\text X}_{2{\\text i}}^2+{\\Upsilon}_5 {\\text X}_{1{\\text i}} {\\text X}_{2{\\text i}} $$<\/p>\n<p>If the data is homoscedastic, then \\({\\hat \\epsilon}_{\\text i}^2\\) must not be explained by any of the variables and the null hypothesis is: \\({\\text H}_0:{\\Upsilon}_1=\u22ef={\\Upsilon}_5=0\\)<\/p>\n<p>The test statistic is calculated as \\(\\text {nR}^2\\) where \\({\\text R}^2\\) is calculated in the second regression and that the test statistic has a \\(\\chi_{ \\frac{{\\text k}{(\\text k}+3)}{2} }^2\\) (chi-square distribution), where k is the number of explanatory variables in the first-step model.<\/p>\n<p>For instance, if the number of the explanatory variables is two, k=2, then the test statistic has a distribution of \\(\\chi_5^{2}\\).<\/p>\n<h3>Modeling Heteroskedastic Data<\/h3>\n<p>The three common methods of handling data with heteroskedastic shocks include:<\/p>\n<ol type=\"1\">\n<li><b>Ignoring the heteroskedasticity when approximating the parameters and then utilizing the White covariance estimator in hypothesis tests.<\/b>\n<p>However simple, this method leads to less accurate model parameter estimates compared to other methods that address the heteroskedasticity.<\/p>\n<\/li>\n<li><b>Transformation of data.<\/b>\n<p>For instance, positive data can be log-transformed to try and remove heteroskedasticity and give a better view of data. Another transformation can be in the form of dividing the dependent variable by another positive variable.<\/p>\n<\/li>\n<li><b>Use of weighted least squares (WLS).<\/b>\n<p>This is a complicated method that applies weights to the data before approximating the parameters. That is if we know that \\(\\text{Var}(\\epsilon_{\\text i} )={\\text w}_{\\text i}^2 \\sigma^2\\) where \\({\\text w}_{\\text i}\\) is known then we can transform the data by dividing by \\({\\text w}_{\\text i}\\) to remove the heteroskedasticity from the errors. In other words, the WLS regresses \\(\\frac {{\\text Y}_{\\text i}}{{\\text w}_{\\text i}}\\) on \\(\\frac {{\\text X}_{\\text i}}{{\\text w}_{\\text i}}\\) such as:<\/p>\n<p>$$ \\begin{align*} \\cfrac {{\\text Y}_{\\text i}}{{\\text w}_{\\text i}} &amp; =\\alpha \\cfrac {1}{{\\text w}_{\\text i}} +\\beta \\cfrac {{\\text X}_{\\text i}}{{\\text w}_{\\text i}} +\\cfrac {\\epsilon_{\\text i}}{{\\text w}_{\\text i}} \\\\ {\\bar {\\text Y}}_{\\text i} &amp; =\\alpha {\\bar {\\text C} }_{\\text i}+\\beta {\\bar {\\text X} }_{\\text i}+{\\bar {\\epsilon}}_{\\text i} \\\\ \\end{align*} $$<\/p>\n<\/li>\n<\/ol>\n<p>Note that the parameters of the model above are estimated using OLS on the transformed data. That is, the weighted version of \\({\\text Y}_{\\text i}\\) which is \\({\\bar {\\text Y}}_{\\text i} \\) on two weighted explanatory variables \\({\\bar {\\text C}}_{\\text i}=\\cfrac {1}{{\\text w}_{\\text i}}\\) and \\({\\bar {\\text X}}_{\\text i}=\\frac {{\\text X}_{\\text i}}{{\\text w}_{\\text i}}\\) . Note that the WLS model does not clearly include the intercept \\(\\alpha\\), but the interpretation is still the same, that is, the intercept.<\/p>\n<h2>Multicollinearity<\/h2>\n<p>Multicollinearity occurs when others can significantly explain one or more independent variables. For instance, in the case of two independent variables, there is evidence of multicollinearity if the \\({\\text R}^2\\) is very high if one variable is regressed on the other.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1373\" height=\"1041\" class=\"aligncenter size-full wp-image-6978\" style=\"max-width: 100%;\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-180.jpg\" alt=\"Multicollinearity\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-180.jpg 1373w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-180-300x227.jpg 300w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-180-768x582.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-180-1024x776.jpg 1024w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-180-400x303.jpg 400w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-180-600x455.jpg 600w\" sizes=\"auto, (max-width: 1373px) 100vw, 1373px\" \/>In contrast with multicollinearity, perfect correlation is where one of the variables is perfectly correlated to others such that the \\({\\text R}^2\\) of regression of \\({\\text X}_{\\text j}\\) on the remaining independent variable is precisely 1.<\/p>\n<p>Conventionally, when \\({\\text R}^2\\) is above 90% leads to problems in medium sample sizes such as that of 100. Multicollinearity does not pose an issue in parameter approximation, but rather, it brings some difficulties in modeling the data.<\/p>\n<p>When multicollinearity is present, some of the coefficients in a regression model are jointly statistically significant (F-statistic is substantial), but the individual t-statistic is very small (less than 1.96) since the regression analysis assumes the collective effect of the variables rather than the individual effect of the variables.<\/p>\n<h3>Addressing Multicollinearity<\/h3>\n<p>There are two ways of dealing with multicollinearity:<\/p>\n<ol type=\"I\">\n<li>Ignoring multicollinearity altogether since it technically not a problem.<\/li>\n<li>Identification of the multicollinear variables and excluding them from the model. Identification of multicollinear variables using the variance inflation factor which compares the variance of the regression coefficients on independent variable \\({\\text X}_{\\text j}\\) in two models: one that incorporates only \\({\\text X}_{\\text j}\\) and one that omits k independent variables: $$ {\\text X}_{\\text {ji}}=\\Upsilon_0+\\Upsilon_1 {\\text X}_{1{\\text i}}+\u22ef+\\Upsilon_{{\\text j}-1} X_{{\\text j}-1{\\text i}}+\\Upsilon_{{\\text j}+1} {\\text X}_{{\\text j}+1{\\text i}}+\u22ef+\\Upsilon_{\\text k} {\\text X}_{{\\text k}{\\text i}}+\\eta_{\\text i} $$\n<p>The variance inflation factor (VIF) for the variable \\({\\text X}_{\\text j}\\) is given by:<\/p>\n<p>$$ \\text{VIF}_{\\text j}=\\cfrac {1}{1-{\\text R}_{\\text j}^2 } $$<\/p>\n<p>Where \\({\\text R}_{\\text j}^2\\) originates from regressing \\({\\text X}_{\\text j}\\) on the other variable in the model. When the value of the VIF is above 10, then it is considered too much and the variable should be excluded from the model.<\/p>\n<\/li>\n<\/ol>\n<h2>Residual Plots<\/h2>\n<p>Residual plots are utilized to identify the deficiencies in a model specification. When the residual plots are not systematically related to any of the included independent (explanatory variables) and relatively small (within \\(\\pm\\)4s, where s, is the standard shock deviation of the model) in magnitude, then the model is ideally good.<\/p>\n<p>Residual plot is a graph of \\(\\hat \\epsilon_{\\text i}\\) (vertical axis) against the independent variables \\({\\text x}_{\\text i}\\). Alternatively, we could use the standardized residuals \\(\\frac {\\hat \\epsilon_{\\text i}}{\\text s}\\) which makes sure that the deviation is apparent.<\/p>\n<h2>Outliers<\/h2>\n<p>Outliers are values that, if removed from the sample, produce large changes in the estimated coefficients. They can also be viewed as data points that <strong>deviate significantly<\/strong> from the normal objects as if they were <strong>generated by a different mechanism.<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1373\" height=\"1041\" class=\"aligncenter size-full wp-image-6979\" style=\"max-width: 100%;\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182.jpg\" alt=\"Outliers\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182.jpg 1373w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182-300x227.jpg 300w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182-768x582.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182-1024x776.jpg 1024w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182-400x303.jpg 400w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/Page-182-600x455.jpg 600w\" sizes=\"auto, (max-width: 1373px) 100vw, 1373px\" \/>Cook\u2019s distance helps us measure the impact of dropping a single observation j on a regression (and the line of best fit).<\/p>\n<p>The Cook\u2019s distance is given by:<\/p>\n<p>$$ {\\text D}_{\\text j}=\\cfrac {\\sum_{\\text i=1}^{\\text n} \\left( {{\\bar {\\text Y}}_{\\text i}}^{(-{\\text j})}-{\\hat {\\text Y} }_{\\text i} \\right)^2 }{\\text{ks}^2 } $$<\/p>\n<p>Where:<\/p>\n<p>\\({{\\bar {\\text Y}}_{\\text i}}^{(-{\\text j})}\\)=fitted value of \\({{\\bar {\\text Y}}_{\\text i}}\\) when the observed value j is excluded, and the model is approximated using n-1 observations.<\/p>\n<p>k=number of coefficients in the regression model<\/p>\n<p>\\(\\text s^2\\)=estimated error variance from the model using all observations<\/p>\n<p>When a variable is an inline (does not affect the coefficient estimates when excluded), the value of its Cook\u2019s distance (\\({\\text D}_{\\text j}\\)) is small. On the other hand, \\({\\text D}_{\\text j}\\) is higher than 1 if it is an outlier.<\/p>\n<h4>Example: Calculating Cook\u2019s Distance<\/h4>\n<p>Consider the following data sets:<\/p>\n<p>$$ \\begin{array}{c|c|c} \\textbf{Observation} &amp; \\textbf{Y} &amp; \\textbf{X} \\\\ \\hline {1} &amp; {3.67} &amp; {1.85} \\\\ \\hline {2} &amp; {1.88} &amp; {0.65} \\\\ \\hline {3} &amp; {1.35} &amp; {-0.63} \\\\ \\hline {4} &amp; {0.34} &amp; {1.24} \\\\ \\hline {5} &amp; {-0.89} &amp; {-2.45} \\\\ \\hline {6} &amp; {1.95} &amp; {0.76} \\\\ \\hline {7} &amp; {2.98} &amp; {0.85} \\\\ \\hline {8} &amp; {1.65} &amp; {0.28} \\\\ \\hline {9} &amp; {1.47} &amp; {0.75} \\\\ \\hline {10} &amp; {1.58} &amp; {-0.43} \\\\ \\hline {11} &amp; {0.66} &amp; {1.14} \\\\ \\hline {12} &amp; {0.05} &amp; {-1.79} \\\\ \\hline {13} &amp; {1.67} &amp; {1.49} \\\\ \\hline {14} &amp; {-0.14} &amp; {-0.64} \\\\ \\hline {15} &amp; {9.05} &amp; {1.87} \\\\ \\end{array} $$<\/p>\n<p>If you look at the data sets above, it is easy to see that observation 15 is quite more significant than the rest of the observations, and there is a possibility to be an outlier. However, we need to ascertain this.<\/p>\n<p>We begin by fitting the whole dataset (\\({{\\bar {\\text Y}}_{\\text i}}\\)) and then the 14 observations which remain after excluding the dataset that we believe is an outlier.<\/p>\n<p>If we fit the whole dataset, we get the following regression equation:<\/p>\n<p>$$ {{\\bar {\\text Y}}_{\\text i}}=1.4465+1.1281{\\text X_{\\text i}} $$<\/p>\n<p>And if we exclude the observation that we believe it is an outlier we get:<\/p>\n<p>$$ {{\\bar {\\text Y}}_{\\text i}}^{(-{\\text j})}=1.1516+0.6828{\\text X_{\\text i}} $$<\/p>\n<p>Now the fitted values are as shown below:<\/p>\n<p>$$ \\begin{array}{c|c|c} \\textbf{Observation} &amp; \\textbf{Y} &amp; \\textbf{X} &amp; \\bf{{{\\bar {\\text Y}}_{\\text i}}} &amp; \\bf{{{\\bar {\\text Y}}_{\\text i}}^{(-{\\text j})}} &amp; \\bf{\\left({{\\bar {\\text Y}}_{\\text i}}^{(-{\\text j})}-{{{\\bar {\\text Y}}_{\\text i}}} \\right)^2} \\\\ \\hline {1} &amp; {3.67} &amp; {1.85} &amp; {3.533} &amp; {2.4148} &amp; {1.2504} \\\\ \\hline {2} &amp; {1.88} &amp; {0.65} &amp; {2.179} &amp; {1.5954} &amp; {0.3406} \\\\ \\hline {3} &amp; {1.35} &amp; {-0.63} &amp; {0.7358} &amp; {0.7214} &amp; {0.0002} \\\\ \\hline {4} &amp; {0.34} &amp; {1.24} &amp; {2.8453} &amp; {1.9983} &amp; {0.7174} \\\\ \\hline {5} &amp; {-0.89} &amp; {-2.45} &amp; {-1.3174} &amp; {-0.5213} &amp; {0.6338} \\\\ \\hline {6} &amp; {1.95} &amp; {0.76} &amp; {2.3039} &amp; {1.6705} &amp; {0.4012} \\\\ \\hline {7} &amp; {2.98} &amp; {0.85} &amp; {2.4053} &amp; {1.732} &amp; {0.4533} \\\\ \\hline {8} &amp; {1.65} &amp; {0.28} &amp; {1.7624} &amp; {1.3428} &amp; {0.1761} \\\\ \\hline {9} &amp; {1.47} &amp; {0.75} &amp; {2.2926} &amp; {1.6637} &amp; {0.3955} \\\\ \\hline {10} &amp; {1.58} &amp; {-0.43} &amp; {0.9614} &amp; {0.858} &amp; {0.0107} \\\\ \\hline {11} &amp; {0.66} &amp; {1.14} &amp; {2.7325} &amp; {1.921} &amp; {0.6585} \\\\ \\hline {12} &amp; {0.05} &amp; {-1.79} &amp; {-0.5728} &amp; {-0.07061} &amp; {0.2522} \\\\ \\hline {13} &amp; {1.67} &amp; {1.49} &amp; {3.1274} &amp; {2.169} &amp; {0.9185} \\\\ \\hline {14} &amp; {-0.14} &amp; {-0.64} &amp; {0.7245} &amp; {0.7146} &amp; {0.0001} \\\\ \\hline {15} &amp; {9.05} &amp; {1.87} &amp; {3.556} &amp; {2.4284} &amp; {1.2715} \\\\ \\hline {} &amp; {} &amp; {} &amp; {} &amp; \\textbf{Sum} &amp; \\bf{7.4800} \\\\ \\end{array} $$<\/p>\n<p>If the \\(\\text s^2=3.554\\) the Cook\u2019s distance is given by:<\/p>\n<p>$$ {\\text D}_{\\text j}=\\cfrac {\\sum_{\\text i=1}^{\\text n} \\left( {{\\bar {\\text Y}}_{\\text i}}^{(-{\\text j})}-{\\hat {\\text Y} }_{\\text i} \\right)^2 }{\\text{ks}^2 } =\\cfrac {7.4800}{2\u00d73.554}=1.0523 $$<\/p>\n<p>Since \\({\\text D}_{\\text j} &gt; 1\\), then observation 15 can be considered as an outlier.<\/p>\n<h3>Strengths of Ordinary Least Squares (OLS )<\/h3>\n<p>OLS\u00a0 is the Best Linear Unbiased Estimator (BLUE) when some key assumptions are met, which implies that it can assume the smallest possible variance among any given estimator that is linear and unbiased:<\/p>\n<ul>\n<li><strong>Linearity<\/strong>: the\u00a0parameters\u00a0being estimated using the OLS method must be themselves linear.<\/li>\n<li><strong>Random<\/strong>: the data must have been\u00a0randomly sampled\u00a0from the\u00a0population.<\/li>\n<li><strong>Non-Collinearity<\/strong>: the regressors being calculated should not be perfectly correlated with each other.<\/li>\n<li><strong>Exogeneity<\/strong>: the regressors aren\u2019t correlated with the\u00a0error term.<\/li>\n<li><strong>Homoscedasticity<\/strong>: the variance of the error term is constant<\/li>\n<\/ul>\n<p>However, being a BLUE estimator comes with the following limitations:<\/p>\n<ol type=\"I\">\n<li>A big proportion of the estimators are not linear such as maximum likelihood estimators (but biased).<\/li>\n<li>BLUE property is heavily dependent on residuals being homoskedastic. In the case that the variances of residuals vary the independent variables, then it is possible to construct linear unbiased estimators (LUE) of the coefficients \u03b1 and \u03b2 using WLS but with extra assumptions.<\/li>\n<\/ol>\n<p>When the residuals and iid and normally distributed with a mean of 0 and variance of \\(\\sigma^2\\), formally stated as \\(\\epsilon_{i} \\sim^{iid} {\\text N}(0,\\sigma^2)\\) makes the upgrades BLUE to BUE (Best Unbiased Estimator) by virtue having the smallest variance among all linear and non-linear estimators. However, errors being normally distributed is not a requirement for accurate estimates of the model coefficients or a necessity for desirable properties of estimators.<\/p>\n<blockquote>\n<h2>Practice Question 1<\/h2>\n<p>Which of the following statements is\/are correct?<\/p>\n<p>I. Homoskedasticity means that the variance of the error terms is constant for all independent variables.<\/p>\n<p>II. Heteroskedasticity means that the variance of error terms varies over the sample.<\/p>\n<p>III. The presence of conditional heteroskedasticity reduces the standard error.<\/p>\n<p>A. Only I<\/p>\n<p>B. II and III<\/p>\n<p>C. All statements are correct<\/p>\n<p>D. None of the statements are correct<\/p>\n<p><strong>Solution<\/strong><\/p>\n<p>The correct answer is <strong>C<\/strong>.<\/p>\n<p>All statements are correct<\/p>\n<p>If the variance of the residuals is constant across all observations in the sample, the regression is said to be homoskedastic. When the opposite is true, the regression is said to exhibit heteroskedasticity, i.e., the variance of the residuals is not the same across all observations in the sample. The presence of conditional heteroskedasticity poses a significant problem: it introduces a bias into the estimators of the standard error of the regression coefficients. As such, it understates the standard error.<\/p>\n<h2>Practice Question 2<\/h2>\n<p>A financial analyst fails to include a variable which inherently has a non-zero coefficient in his regression analysis. Moreover, the ignored variable is highly correlated with the remaining variables.<\/p>\n<p>What is the most likely deficiency of the analyst\u2019s model?<\/p>\n<p>A. Omitted variable bias.<\/p>\n<p>B. Bias due to inclusion of extraneous variables.<\/p>\n<p>C. Presence of heteroskedasticity.<\/p>\n<p>D. None of the above.<\/p>\n<p><strong>Solution<\/strong><\/p>\n<p>The correct answer is <strong>A<\/strong>.<\/p>\n<p>Ommitted variable bias occurs under two conditions:<\/p>\n<p>I. A variable with a non-zero coefficient is omitted<\/p>\n<p>II. A variable that is omitted is correlated with remaining (included) variables.<\/p>\n<p>These conditions are met in the description of the analyst\u2019s model.<\/p>\n<p>Option B is incorrect since an extraneous variable is one that is unnecessarily included in the model, whose true coefficient and consistently approximated value is 0 in large sample sizes.<\/p>\n<p>Option C is incorrect because heteroskedasticity is a condition where the variance of the errors varies systematically with the independent variables of the model.<\/p>\n<\/blockquote>\n<div style=\"background: #f3f4f6; padding: 22px 18px; border-radius: 14px; margin: 30px 0; text-align: center;\">\n<div style=\"font-size: 17px; line-height: 1.4; margin-bottom: 14px; color: #111827; font-weight: 600;\">Ready to test model specification, detect bias, and interpret regression diagnostics under exam conditions?<\/div>\n<p><a style=\"display: inline-flex; align-items: center; justify-content: center; padding: 14px 24px; border-radius: 999px; background: #1d4ed8; color: #ffffff; text-decoration: none; font-weight: bold; font-size: 17px; line-height: 1;\" href=\"https:\/\/analystprep.com\/free-trial\/\" target=\"_blank\" rel=\"noopener noreferrer\"> Start Free Trial \u2192 <\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>After completing this reading, you should be able to: Explain how to test whether regression is affected by heteroskedasticity. Describe approaches to using heteroskedastic data. Characterize multicollinearity and its consequences; distinguish between multicollinearity and perfect collinearity. Describe the consequences of&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[6,16],"tags":[],"class_list":["post-3731","post","type-post","status-publish","format-standard","hentry","category-frm","category-quantitative-analysis","blog-post","no-post-thumbnail","animate"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Regression Diagnostics| AnalystPrep- FRM Part 1 Study Notes<\/title>\n<meta name=\"description\" content=\"Explain how to test whether regression is affected by heteroskedasticity.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Regression Diagnostics| AnalystPrep- FRM Part 1 Study Notes\" \/>\n<meta property=\"og:description\" content=\"Explain how to test whether regression is affected by heteroskedasticity.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/\" \/>\n<meta property=\"og:site_name\" content=\"CFA, FRM, and Actuarial Exams Study Notes\" \/>\n<meta property=\"article:published_time\" content=\"2019-12-31T14:42:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-27T16:01:27+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/regressioN-diag-NEW.png\" \/>\n\t<meta property=\"og:image:width\" content=\"471\" \/>\n\t<meta property=\"og:image:height\" content=\"244\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Jasmine Keizer\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jasmine Keizer\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/\"},\"author\":{\"name\":\"Jasmine Keizer\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/2280b3ab50717cffd45a0f28f27eb6b5\"},\"headline\":\"Regression Diagnostics\",\"datePublished\":\"2019-12-31T14:42:41+00:00\",\"dateModified\":\"2026-02-27T16:01:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/\"},\"wordCount\":3426,\"image\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/Img_1-1536x782.jpg\",\"articleSection\":[\"FRM\",\"Quantitative Analysis\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/\",\"name\":\"Regression Diagnostics| AnalystPrep- FRM Part 1 Study Notes\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/Img_1-1536x782.jpg\",\"datePublished\":\"2019-12-31T14:42:41+00:00\",\"dateModified\":\"2026-02-27T16:01:27+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/2280b3ab50717cffd45a0f28f27eb6b5\"},\"description\":\"Explain how to test whether regression is affected by heteroskedasticity.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/#primaryimage\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/Img_1.jpg\",\"contentUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/Img_1.jpg\",\"width\":1590,\"height\":809,\"caption\":\"Homoskedasticity vs Heteroskedasticity\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/frm\\\/regression-diagnostics\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Regression Diagnostics\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#website\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/\",\"name\":\"CFA, FRM, and Actuarial Exams Study Notes\",\"description\":\"Question Bank and Study Notes for the CFA, FRM, and Actuarial exams\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/2280b3ab50717cffd45a0f28f27eb6b5\",\"name\":\"Jasmine Keizer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6d6cbbbd1d0c7637649aa2a266d4b93d5b354480307d29433711b664817e0497?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6d6cbbbd1d0c7637649aa2a266d4b93d5b354480307d29433711b664817e0497?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/6d6cbbbd1d0c7637649aa2a266d4b93d5b354480307d29433711b664817e0497?s=96&d=mm&r=g\",\"caption\":\"Jasmine Keizer\"},\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/author\\\/admin\\\/\"}]}<\/script>\n<meta property=\"og:video\" content=\"https:\/\/www.youtube.com\/embed\/C1tgpoyD7Yw\" \/>\n<meta property=\"og:video:type\" content=\"text\/html\" \/>\n<meta property=\"og:video:duration\" content=\"1745\" \/>\n<meta property=\"og:video:width\" content=\"480\" \/>\n<meta property=\"og:video:height\" content=\"270\" \/>\n<meta property=\"ya:ovs:adult\" content=\"false\" \/>\n<meta property=\"ya:ovs:upload_date\" content=\"2019-12-31T14:42:41+00:00\" \/>\n<meta property=\"ya:ovs:allow_embed\" content=\"true\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Regression Diagnostics| AnalystPrep- FRM Part 1 Study Notes","description":"Explain how to test whether regression is affected by heteroskedasticity.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/","og_locale":"en_US","og_type":"article","og_title":"Regression Diagnostics| AnalystPrep- FRM Part 1 Study Notes","og_description":"Explain how to test whether regression is affected by heteroskedasticity.","og_url":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/","og_site_name":"CFA, FRM, and Actuarial Exams Study Notes","article_published_time":"2019-12-31T14:42:41+00:00","article_modified_time":"2026-02-27T16:01:27+00:00","og_image":[{"width":471,"height":244,"url":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2019\/12\/regressioN-diag-NEW.png","type":"image\/png"}],"author":"Jasmine Keizer","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jasmine Keizer","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/#article","isPartOf":{"@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/"},"author":{"name":"Jasmine Keizer","@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/2280b3ab50717cffd45a0f28f27eb6b5"},"headline":"Regression Diagnostics","datePublished":"2019-12-31T14:42:41+00:00","dateModified":"2026-02-27T16:01:27+00:00","mainEntityOfPage":{"@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/"},"wordCount":3426,"image":{"@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/#primaryimage"},"thumbnailUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/02\/Img_1-1536x782.jpg","articleSection":["FRM","Quantitative Analysis"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/","url":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/","name":"Regression Diagnostics| AnalystPrep- FRM Part 1 Study Notes","isPartOf":{"@id":"https:\/\/analystprep.com\/study-notes\/#website"},"primaryImageOfPage":{"@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/#primaryimage"},"image":{"@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/#primaryimage"},"thumbnailUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/02\/Img_1-1536x782.jpg","datePublished":"2019-12-31T14:42:41+00:00","dateModified":"2026-02-27T16:01:27+00:00","author":{"@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/2280b3ab50717cffd45a0f28f27eb6b5"},"description":"Explain how to test whether regression is affected by heteroskedasticity.","breadcrumb":{"@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/#primaryimage","url":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/02\/Img_1.jpg","contentUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/02\/Img_1.jpg","width":1590,"height":809,"caption":"Homoskedasticity vs Heteroskedasticity"},{"@type":"BreadcrumbList","@id":"https:\/\/analystprep.com\/study-notes\/frm\/regression-diagnostics\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/analystprep.com\/study-notes\/"},{"@type":"ListItem","position":2,"name":"Regression Diagnostics"}]},{"@type":"WebSite","@id":"https:\/\/analystprep.com\/study-notes\/#website","url":"https:\/\/analystprep.com\/study-notes\/","name":"CFA, FRM, and Actuarial Exams Study Notes","description":"Question Bank and Study Notes for the CFA, FRM, and Actuarial exams","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/analystprep.com\/study-notes\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/2280b3ab50717cffd45a0f28f27eb6b5","name":"Jasmine Keizer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/6d6cbbbd1d0c7637649aa2a266d4b93d5b354480307d29433711b664817e0497?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/6d6cbbbd1d0c7637649aa2a266d4b93d5b354480307d29433711b664817e0497?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/6d6cbbbd1d0c7637649aa2a266d4b93d5b354480307d29433711b664817e0497?s=96&d=mm&r=g","caption":"Jasmine Keizer"},"url":"https:\/\/analystprep.com\/study-notes\/author\/admin\/"}]},"og_video":"https:\/\/www.youtube.com\/embed\/C1tgpoyD7Yw","og_video_type":"text\/html","og_video_duration":"1745","og_video_width":"480","og_video_height":"270","ya_ovs_adult":"false","ya_ovs_upload_date":"2019-12-31T14:42:41+00:00","ya_ovs_allow_embed":"true"},"_links":{"self":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/3731","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/comments?post=3731"}],"version-history":[{"count":37,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/3731\/revisions"}],"predecessor-version":[{"id":42420,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/3731\/revisions\/42420"}],"wp:attachment":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/media?parent=3731"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/categories?post=3731"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/tags?post=3731"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}