Formulate and Interpret a Multiple Regression Model That Includes Qualitative Independent Variables

Formulate and Interpret a Multiple Regression Model That Includes Qualitative Independent Variables

Dummy variables are binary variables used to quantify the effect of qualitative independent variables. A dummy variable is assigned a value of 1 if a particular condition is met and, otherwise, a value of 0. The number of dummy variables for \(n\) different classes must equal \(n-1\).

The intercept term measures the average value of the dependent variable of the omitted class. On the other hand, the estimated coefficient on each dummy variable measures the average incremental effect of that dummy variable on the dependent variable.

Example: Incorporating Dummy Variables in a Multiple Regression Model

Adil Suleman, CFA, wishes to identify possible drivers of a company’s percentage return on capital (ROC). He identifies performance measures, including margin (%), sales and debt ratios, and demographic measures, such as the region and the economic sector, as possible drivers of ROC.

The dummy variable “region” is coded 1 when a company is located in the northern region and 0 if it’s in the southern region. On the other hand, the dummy variable “economic sector” is coded 1 when a company belongs to the banking sector and 0 when it belongs to the technology sector.

Suleman regresses ROC against sales, debt ratio, profit margin, region, and sector to obtain the following regression output.

$$ \begin{array}{l|c} \text{Regression Statistics} & \\ \hline \text{Multiple R} & 0.8851 \\ \hline \text{R Square} & 0.7833 \\ \hline \text{Adjusted R Square} & 0.7263 \\ \hline \text{Standard Error} & 0.9561 \\ \hline \text{Observations} & 25 \\ \end{array} $$

$$ \textbf{ANOVA} \\ \begin{array}{c|c|c|c|c} & \text{Df} & \text{SS} & \text{MS} & \text F & \text{Significance F} \\ \hline \text{Regression} & 5 & 62.7924 & 12.5585 & 13.7389 & 0.0000 \\ \hline \text{Residual} & 19 & 17.3676 & 0.9141 & & \\ \hline \text{Total} & 24 & 80.1600 & & & \\ \end{array} $$

$$ \begin{array}{c|c|c|c|c} & \text{Coefficients} & \text{Standard} & \text{t Stat} & \text{P-value} \\ & & \text{Error} & & \\ \hline \text{Intercept} & 10.1241 & 0.8503 & 11.9069 & 0.0000 \\ \hline \text{Sales}& 0.0010 & 0.0004 & 2.4003 & 0.0268 \\ \hline \text{Debt ratio} & 0.0166 & 0.0138 & 1.2017 & 0.2443 \\ \hline \text{Profit Margin} & 0.1807 & 0.0552 & 3.2713 & 0.0040 \\ \hline \text{Region} & 2.1755 & 0.6061 & 3.5896 & 0.0020 \\ \hline \text{Sector} & −0.8703 & 0.4202 & −2.0709 & 0.0522 \\ \end{array} $$

Interpreting the Results

From the above results, the multiple regression equation can be expressed as follows:

$$ \begin{align*} ROC & = 10.1241 + 0.001SAL + 0.0166DR \\ & + 0.1807PM + 2.1755REG − 0.8703SEC \end{align*} $$

Adjusted R2

72.63% of the variation in the return on capital is explained by three quantitative regressors (sales, debt ratio, and profit margin) and two qualitative regressors (region and sector).

Testing the Significance of the Overall Model

\(H_0 : b_1 = b_2 = b_3 = b_4 = b_5 = 0\) versus at \(H_a\) : At least one \(b_j \neq 0\)

From the ANOVA table, column “Significance F,” notice that the p-value is less than 5%. Thus, reject \(H_0\) in favor of \(H_a\). Conclude that the model is statistically significant.

Testing the Significance of Dummy Variables

The coefficient of the region in this regression model is positive and statistically significant at the 0.05 level since the p-value is less than 0.05. We can, therefore, conclude that the northern region is significantly different from the southern region at the 5% significance level.

On the contrary, the banking sector is not significantly different from the technology sector since the p-value is greater than 0.05. Additionally, the sign of this coefficient is negative.


Ninel Khan, a technical trader, is analyzing the seasonality of the price of the GBP/USD. Khan believes the price is significantly different in the first quarter compared to the other three quarters. She uses the first quarter as the reference point in the regression. The number of dummy variables that Khan will need in her regression equation is:

  1. 2.
  2. 3.
  3. 4.


The correct answer is B.

If we need to differentiate among \(n\) categories, the regression should include \(n − 1\) dummy variables. In this case, we have four quarters. Thus, three dummy variables are needed.

Reading 4: Extensions of Multiple Regression

Los 4 (b) Formulate and interpret a multiple regression model that includes qualitative independent variables

Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep

    Daniel Glyn
    Daniel Glyn
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.
    Nyka Smith
    Nyka Smith
    Every concept is very well explained by Nilay Arun. kudos to you man!
    Badr Moubile
    Badr Moubile
    Very helpfull!
    Agustin Olcese
    Agustin Olcese
    Excellent explantions, very clear!
    Jaak Jay
    Jaak Jay
    Awesome content, kudos to Prof.James Frojan
    sindhushree reddy
    sindhushree reddy
    Crisp and short ppt of Frm chapters and great explanation with examples.