Dummy Variables in Regression Analysis

Dummy Variables in Regression Analysis

Dummy variables are binary variables used to quantify the effect of qualitative independent variables. A dummy variable is assigned a value of 1 if a particular condition is met and a value of 0 otherwise. The number of dummy variables for n different classes must equal n-1.

The intercept term measures the average value of the dependent variable of the omitted class, and the estimated coefficient on each dummy variable measures the average incremental effect of that dummy variable on the dependent variable.

Example: Incorporating Dummy Variables in a Multiple Regression Model

Adil Suleman, CFA, wishes to identify possible drivers of a company’s percentage return on capital (ROC). Suleman identifies performance measures including margin (%), sales and debt ratio, and demographic measures such as the region and the economic sector as possible drivers of ROC.

The dummy variable “region” is coded 1 when the company is located in the Northern region and 0 in the Southern region. On the other hand, the dummy variable “economic sector” is coded 1 when a company belongs to the banking sector and 0 when it belongs to the technology sector.

Suleman regresses ROC against sales, debt ratio, profit margin, region, and sector to obtain the following regression output.

$$\small{\begin{array}{lr}\hline{}\textbf{Regression Statistics}\\ \hline\text{Multiple R} & 0.8851\\ \text{R Square} & 0.7833\\ \text{Adjusted R Square} & 0.7263\\ \text{Standard Error} & 0.9561\\ \text{Observations} & 25\\ \hline\end{array}}$$

$$ \textbf{ANOVA} $$

$$\small{\begin{array}{lccccc}\hline{}& \textbf{Df} & \textbf{SS} & \textbf{MS} & \textbf{F} & \textbf{Significance F}\\ \hline\text{Regression} & 5 & 62.7924 & 12.5585 & 13.7389 & 0.0000\\ \text{Residual} & 19 & 17.3676 & 0.9141 & &{}\\ \text{Total} & 24 & 80.1600 & & &{}\\ \hline\end{array}}$$

$$\small{\begin{array}{lccccc}\hline{}& \textbf{Coefficients} & \textbf{Standard Error} & \textbf{t Stat} & \textbf{P-value}\\ \hline\text{Intercept} & 10.1241 & 0.8503 & 11.9069 & 0.0000\\ \text{Sales} & 0.0010 & 0.0004 & 2.4003 & 0.0268\\ \text{Debt ratio} & 0.0166 & 0.0138 & 1.2017 & 0.2443\\ \text{Profit Margin} & 0.1807 & 0.0552 & 3.2713 & 0.0040\\ \text{Region} & 2.1755 & 0.6061 & 3.5896 & 0.0020\\ \text{Sector} & -0.8703 & 0.4202 & -2.0709 & 0.0522\\ \hline\end{array}}$$

Interpreting the Results

From the above results, the multiple regression equation can be expressed as:

$$\text{ROC}=10.1241+0.001SAL+0.0166DR+0.1807PM+2.1755REG-0.8703SEC$$

Adjusted R2

72.63% of the variation in the return on capital is explained by three quantitative regressors (sales, debt ratio, and profit margin) and two qualitative regressors (region and sector).

Test Significance of Overall Model

\(H_{0}:b_{1}=b_{2}=b_{3}=b_{4}=b_{5}=0\) versus at \(H_{a}:\) At least one \(b_{j}≠0\)

From the ANOVA table, column “Significance F,” notice that the p-value is less than 5%.

Thus, reject \(H_{0}\) in favor of \(H_{a}\). Conclude that the model is statistically significant.

Test Significance of Dummy Variables

The coefficient of the region in this regression model is positive and statistically significant at the 0.05 level as the p-value is less than 0.05. We can therefore conclude that the northern region is significantly different from the southern region at the 5% significance level.

On the contrary, the banking sector is not significantly different from the technology sector as the p-value is greater than 0.05. Additionally, the sign of this coefficient is negative.

Question

Ninel Khan, a technical trader, is analyzing the seasonality of the price of the GBP/USD. Khan believes that the price is significantly different in the first quarter relative to the other three quarters. She uses the first quarter as the reference point in the regression. The number of dummy variables that Khan will need in her regression equation is most likely:

     A. 2.

     B. 3.

     C. 4.

Solution

The correct answer is B.

If we need to differentiate among n categories, the regression should include n − 1 dummy variables. In this case, we have four quarters. Thus, three dummy variables are needed.

Reading 2: Multiple Regression

LOS 2 (j) Formulate and interpret a multiple regression, including qualitative independent variables.

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep


    Daniel Glyn
    Daniel Glyn
    2021-03-24
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    2021-03-18
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    2021-02-18
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    2021-02-13
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    2021-01-27
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    2021-01-14
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    2021-01-07
    Crisp and short ppt of Frm chapters and great explanation with examples.