# Formulate and Interpret a Multiple Regression Model That Includes Qualitative Independent Variables

Dummy variables are binary variables used to quantify the effect of qualitative independent variables. A dummy variable is assigned a value of 1 if a particular condition is met and, otherwise, a value of 0. The number of dummy variables for $$n$$ different classes must equal $$n-1$$.

The intercept term measures the average value of the dependent variable of the omitted class. On the other hand, the estimated coefficient on each dummy variable measures the average incremental effect of that dummy variable on the dependent variable.

#### Example: Incorporating Dummy Variables in a Multiple Regression Model

Adil Suleman, CFA, wishes to identify possible drivers of a company’s percentage return on capital (ROC). He identifies performance measures, including margin (%), sales and debt ratios, and demographic measures, such as the region and the economic sector, as possible drivers of ROC.

The dummy variable “region” is coded 1 when a company is located in the northern region and 0 if it’s in the southern region. On the other hand, the dummy variable “economic sector” is coded 1 when a company belongs to the banking sector and 0 when it belongs to the technology sector.

Suleman regresses ROC against sales, debt ratio, profit margin, region, and sector to obtain the following regression output.

$$\begin{array}{l|c} \text{Regression Statistics} & \\ \hline \text{Multiple R} & 0.8851 \\ \hline \text{R Square} & 0.7833 \\ \hline \text{Adjusted R Square} & 0.7263 \\ \hline \text{Standard Error} & 0.9561 \\ \hline \text{Observations} & 25 \\ \end{array}$$

$$\textbf{ANOVA} \\ \begin{array}{c|c|c|c|c} & \text{Df} & \text{SS} & \text{MS} & \text F & \text{Significance F} \\ \hline \text{Regression} & 5 & 62.7924 & 12.5585 & 13.7389 & 0.0000 \\ \hline \text{Residual} & 19 & 17.3676 & 0.9141 & & \\ \hline \text{Total} & 24 & 80.1600 & & & \\ \end{array}$$

$$\begin{array}{c|c|c|c|c} & \text{Coefficients} & \text{Standard} & \text{t Stat} & \text{P-value} \\ & & \text{Error} & & \\ \hline \text{Intercept} & 10.1241 & 0.8503 & 11.9069 & 0.0000 \\ \hline \text{Sales}& 0.0010 & 0.0004 & 2.4003 & 0.0268 \\ \hline \text{Debt ratio} & 0.0166 & 0.0138 & 1.2017 & 0.2443 \\ \hline \text{Profit Margin} & 0.1807 & 0.0552 & 3.2713 & 0.0040 \\ \hline \text{Region} & 2.1755 & 0.6061 & 3.5896 & 0.0020 \\ \hline \text{Sector} & −0.8703 & 0.4202 & −2.0709 & 0.0522 \\ \end{array}$$

#### Interpreting the Results

From the above results, the multiple regression equation can be expressed as follows:

\begin{align*} ROC & = 10.1241 + 0.001SAL + 0.0166DR \\ & + 0.1807PM + 2.1755REG − 0.8703SEC \end{align*}

72.63% of the variation in the return on capital is explained by three quantitative regressors (sales, debt ratio, and profit margin) and two qualitative regressors (region and sector).

### Testing the Significance of the Overall Model

$$H_0 : b_1 = b_2 = b_3 = b_4 = b_5 = 0$$ versus at $$H_a$$ : At least one $$b_j \neq 0$$

From the ANOVA table, column “Significance F,” notice that the p-value is less than 5%. Thus, reject $$H_0$$ in favor of $$H_a$$. Conclude that the model is statistically significant.

### Testing the Significance of Dummy Variables

The coefficient of the region in this regression model is positive and statistically significant at the 0.05 level since the p-value is less than 0.05. We can, therefore, conclude that the northern region is significantly different from the southern region at the 5% significance level.

On the contrary, the banking sector is not significantly different from the technology sector since the p-value is greater than 0.05. Additionally, the sign of this coefficient is negative.

## Question

Ninel Khan, a technical trader, is analyzing the seasonality of the price of the GBP/USD. Khan believes the price is significantly different in the first quarter compared to the other three quarters. She uses the first quarter as the reference point in the regression. The number of dummy variables that Khan will need in her regression equation is:

1. 2.
2. 3.
3. 4.

#### Solution

If we need to differentiate among $$n$$ categories, the regression should include $$n − 1$$ dummy variables. In this case, we have four quarters. Thus, three dummy variables are needed.

Reading 4: Extensions of Multiple Regression

Los 4 (b) Formulate and interpret a multiple regression model that includes qualitative independent variables

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

Subscribe to our newsletter and keep up with the latest and greatest tips for success

Daniel Glyn
2021-03-24
I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
michael walshe
2021-03-18
Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
Nyka Smith
2021-02-18
Every concept is very well explained by Nilay Arun. kudos to you man!