Random Variables

After completing this reading, you should be able to:

  • Describe and distinguish a probability mass function from a cumulative distribution function and explain the relationship between these two.
  • Understand and apply the concept of a mathematical expectation of a random variable.
  • Describe the four common population moments
  • Explain the differences between a probability mass function and a probability density function
  • Characterize the quantile function and quantile-based estimators.
  • Explain the effect of a linear transformation of a random variable on the mean, variance, standard deviation, skewness, kurtosis, median, and interquartile range.

Random Variables

A random variable is a variable whose possible values are outcomes of a random phenomenon. It is a function that maps outcomes of a random process to real values. It can also be termed as the realization of a random process.

Precisely, if \(\omega \) is an element of a sample space Ω and x is the realization, then \(X(\omega) = x\). Conventionally, random variables are given in upper case (such as X, Y, and Z) while the realized random variable is represented in lower case (such as x,y, and z)

For example, let X be the random variable as a result of rolling a die. Therefore, x is the outcome of one roll, and it could take any of the values 1, 2, 3, 4, 5 or 6. The probability that the resulting random variable is equal to 3 can be expressed as:

$$ P(X = x) \text{ where } x = 3$$

Types of the Random Variables

Discrete Random Variables

A discrete random variable is one that produces a set of distinct values. They can be:

  • If the range of all possible values is a finite set, e.g., {1,2,3,4,5,6} in the case of a six-sided die or,
  • If the range of all possible values is a countably infinite set:e.g. {1,2,3, … }

Examples of discrete random variables include:

  • Picking a random stock from the S&P 500
  • the number of candidates registered for the FRM level 1 exam at any given time.
  • The number of study topics in a program

Probability Functions under Discrete Random Variables

Since the possible values of a random variable are mostly numerical, they can be explained using mathematical functions. A function \(f_X (x)=P(X=x)\) for each x in the range of X is the probability function (PF) of X, explains how the total chance (which is 1) is distributed amongst the possible values of X.

There are two functions used when explaining the features of the distribution of discrete random variables: probability mass function (PMF) and cumulative distribution function (CDF).

Probability Mass Function (PMF)

This function gives the probability that a random variable takes a particular value. Since PMF outputs the probabilities, it should possess the following properties:

  1. \(f_X (x) \ge 0 \quad \forall \text{ range of X }\)(value returned must be a nonnegative)
  2. \(\sum_x f_X (x)=1 \) (sum across all value in support of a random variable should be equal to 1)

Example: Bernoulli Distribution

Assume that X is a Bernoulli random variable, the PMF of X is given by:

$$ f_X (x)=p^x (1-p)^{1-x},X=0,1 $$

The Random variables in a Bernoulli distribution are 0 and 1. Therefore,

$$ f_X (0)=p^0 (1-p)^{1-0}=1-p $$ And $$ f_X (1)=p^1 (1-p)^{1-1}=p $$ Looking at the above results, the first property \(f_X (x) \ge 0)\) of probability distributions is met. For the second property: $$ \sum_x f_X (x)= \sum_{x=0,1} f_X (x)=1-p+ p=1 $$ Moreover, the probability that we observe random variable 0 is 1-p, and the probability of observing random variable 1 is p. More precisely, $$ F_X (x)=\begin{cases} 1-p, & x=0 \\ p, & x=1 \end{cases} $$ The graph of the Bernoulli PMF is shown below, assuming the p=0.7. Note that PMF is only defined for X=0,1. 

frm-part-1-random-variables-distributionCumulative Distribution Function (CDF)

CDF measures the probability of realizing a value less than or equal to the input x, \(Pr⁡(X \le x)\). It is denoted by \(F_X (x)\) and so,

$$ F_X (x)=Pr⁡(X \le x) $$ CDF is monotonic and in increasing in x since it measures total probability. It is a continuous function (in contrast with PMF) because it supports any value between 0 and 1 (in the case of Bernoulli random variables) inclusively.

For instance, the CDF of the Bernoulli random variable is:

$$ F_X (x)=\begin{cases} 0, & x0 \end{cases} $$ \(F_X (x)\) is defined for all real values of x. The graph of \(F_X (x)\) against x begins at 0 then rises by jumps as values of x are realized for which p(X = x) is positive. The graph reaches its maximum value at 1. For the Bernoulli distribution with p=0.7, the graph is shown below: 

frm-bernoulli RVSince CDF is defined for all values of x, the CDF for a Bernoulli distribution with a parameter p=0.7 is:

$$ F_X (x)=\begin{cases} 0, & x0 \end{cases} $$ The corresponding graph is as shown above

Relationship Between the CDF and PMF with Discrete Random Variables

The CDF can be represented as the sum of the PMF for all the values that are less than or equal to x. Simply put:

$$ F_X (x)=\sum_{t \epsilon R(x),t \le x} f_X (t) $$ Where R(x) is the range of realized values of X (X=x).

On the other hand, PMF is equivalent to the difference between the consecutive values of X. That is:

$$ f_X (x)=F_X (x)-F_X (x-1) $$

Example: PMF and CDF under Discrete Random Variables

There are 8 hens with different weights in a cage. Hens1 to 3 weigh 1 kg, hens 4 and 5 weigh 2kg, and the rest weigh 3kg. We need to develop the PMF and the CDF.


The random variables (X = 1kg, 2kg, or 3kg) here are the weights of the chicken,

$$ \begin{align*} f_X (1) & =Pr(X=1)=\cfrac {3}{8} \\ f_X (2) & =Pr(X=2)=\cfrac {2}{8}=\cfrac {1}{4} \\ f_X (3) & =Pr(X=3)=\cfrac {3}{8} \\ \end{align*} $$ So, the PMF is: $$ \begin{cases} \frac { 3 }{ 8 } , & x=1 \\ \frac { 1 }{ 4 } , & x=2 \\ \frac { 3 }{ 8 } , & x=3 \end{cases} $$ For the CDF, it includes all the realized values of the random variable. So, $$ \begin{align*} F_X (0) & =Pr⁡(X \le 0)=0 \\ F_X (1) & =Pr⁡(X \le 1)=\cfrac {3}{8} \\ F_X (2) & =Pr⁡(X \le 2)=\cfrac {3}{8}+\cfrac {2}{8}=\cfrac {5}{8} \left[ \text{Using } F_X (x)=\sum_{t \epsilon R(x),t \le x} f_X (t) \right] \\ F_X (3) & =Pr⁡(X \le 3)=\cfrac {5}{8}+\cfrac {3}{8}=1 \\ \end{align*} $$ So that the CDF is $$ F_X (x)=\begin{cases} 0, & x < 1 \\ \frac { 3 }{ 8 } , & 1\le x < 2 \\ \frac { 5 }{ 8 } , & 2\le x < 3 \\ 1, & 3 \le x \end{cases} $$ Note that $$ f_X (x)=F_X (x)-F_X (x-1) $$ Which implies that: $$ f_X (3)=F_X (3)-F_X (2)=1-\cfrac {5}{8}=\cfrac {3}{8} $$ Which gives the same result as before.

Continuous Random Variables

A continuous random variable can assume any value along a given interval of a number line. For instance, \(x > 0,(-\infty < x < \infty ) \text{ and } 0 < x < 1\). Examples of continuous random variables include the price of stock or bond, or the value at risk of a portfolio at a particular point in time.

The following relationship holds for a continuous random variable X:

$$ P[r_1 < X < r_2 ]=p $$ This implies that p is the likelihood that the random variable X falls between \(r_1\) and \(r_2\).

The Probability Density Function (PDF) under Continuous Random Variables

A probability density function (PDF) allows us to calculate the probability of an event.

Given a PDF f(x), we can determine the probability that x falls between a and b:

$$ Pr⁡(a < x \le b)=\int _{ a }^{ b }{ f\left( x \right) dx } $$ The probability that X lies between two values is the area under the density function graph between the two values: 

frm-1-probability-density-functionProbability distribution function is another term used to refer to the probability density function. The properties of the PDF are the same those of PMF. That is:

  1. \(f_X (x) \ge 0,-\infty \le x \le \infty\) (nonnegativity)
  2. \(\int_{r_{min}}^{r_{max}} f(x)dx=1\)(The sum of all probabilities must be equal to 1, just like in discrete random variables)

The upper and lower bounds of f(x) are defined by \(r_{min}\) and \(r_{max}\)

Cumulative Distribution Functions (CDF) under Continuous Random Variables

It is also called the cumulative density function and is closely related to the concept of a PDF. The likelihood of a random variable falling below a specific value is defined by a CDF. To determine the CDF, the PDF is integrated from its lower bound.

The corresponding density function’s capital letter has traditionally been used to denote the CDF. The following computation depicts a CDF, F(x), of a random variable X whose PDF is f(x):

$$ F(a)=\int_{-\infty}^{a}f(x)d(x) =P[X \le a] $$ The region under the PDF is a depiction of the CDF. The CDF is usually non-decreasing and varies from zero to one. We must have a zero CDF at the minimum value of the PDF. The variable cannot be less than the minimum. The likelihood of the random variable being less than or equal to the maximum is 100%.

To obtain the PDF from the CDF, we have to compute the first derivative of the CDF. Therefore:

$$ f(x)=\cfrac {dF(x)}{dx} $$ Next, we look at how to determine the probability that a random variable XX will fall between some two values –a and b. $$ P[a < X \le b]=\int_a^b f(x)dx=F(b)-F(a) $$ Where a is less than b.

The following relationship is also true:

$$ P[X > a]=1-F(a) $$ Example: Formulating the CDF of a Continuous Random Variable The continuous random variable X has a pdf of \(f(x)=12x^2 (1-x) \text{ for } 0 < x < 1\). We need to find the expression for F(x).


We know that: $$ \begin{align*} F(x) & =\int_{-\infty }^x f(t)d(t) \\ F(x) & =\int_0^x 12t^{2} (1-t)d(t)={ [4t^3-3t^4 ] }_{ 0 }^{ x }=x^3 (4-3x) \end{align*} $$ So, $$ F(x)=x^3 (4-3x) $$

Expected Values

The expected values are the numerical summaries of features of the distribution of random variables. Denoted by E[X] or \(\mu\), it gives the value of X that is the measure of average or center of the distribution of X. The expected value is the mean of the distribution of X.

For discrete random variables, the expected value is given by: $$ E[X]=\sum_x xf(X) $$ It is simply the sum of the product of the value of the random variable and the probability assumed by the corresponding random variable.

Example: Calculating the Expected Value in Discrete Random Variable

There are 8 hens with different weights in a cage. Hens1 to 3 weigh 1 kg, hens 4 and 5 weigh 2kg, and the rest weigh 3kg. We need to calculate the mean weight of the hens.


We had calculated the PDF as: $$ f(x)=\begin{cases} \frac { 3 }{ 8 } , & x=1 \\ \frac { 1 }{ 4 } , & x=2 \\ \frac { 3 }{ 8 } , & x=3 \end{cases} $$ Now, $$ E[X]=\sum_x xf(X)=1×\frac {3}{8}+2×\frac {1}{4}+3×\frac {3}{8}=2 $$ So, the mean weight of the hens in the cage is 2kg.

For the continuous random variable, the mean is given by:

$$ E[X]=\int_{-\infty}^\infty xf(x)dx $$ Basically, it is integrating the product of the value of the random variable and the probability assumed by the corresponding random variable.

Example: Calculating the Expected Value of Continuous Random Variable

The continuous random variable X has a pdf of \(f(x)=12x^2 (1-x)\) for \(0 < x < 1\).

We need to calculate E[X].


we know that: $$ E[X]=\int_{-\infty}^\infty xf(x)dx $$ So, $$ E(X)=\int_0^1 12x^2 (1-x)d(x)={[4x^3-3x^4 ]}^{1}_{0}=0.6 $$ For random variables that are functions, we apply the same method as that of a “single” random variable. That is, summing or integrating the product of the value of the random variable function and the probability assumed by the corresponding random variable function.

Assume that the random variable function is g(x). Then:

$$ E[g(x)]=\sum_x g(x)f(x) $$ for the discrete case and $$ E[g(x)]=\int_{-\infty}^\infty g(x)f(x)dx $$ for the continuous case.

Example: Calculation the Expected Values Involving Functions as Random Variable.

A random variable X has PDF of: $$ f_X (x)=\frac {1}{5} x^2,\text{ for } 0 < x < 3 $$ Calculate \(E(2X+1)\)


$$ \begin{align*} E[g(x)] & =\int_{-\infty}^\infty g(x)f(x)dx \\ & =\int_{-\infty}^\infty \frac {1}{5} (2x+1) x^2 dx=\frac {1}{5} {\left[\frac {x^4}{2}+\frac {x^3}{3} \right]}^{3}_{0}=8.7 \\ \end{align*} $$

Properties of Expectation Property

The expectation operator is a linear operator. Consequently, the expectation of a constant is a constant. That is, E(c)=c. Moreover, the expected value of a random variable is a constant and not a random variable.

For non-linear function g(x),E(g(x))\(\neq\) g(E(x)). For instance, \(E \left(\frac {1}{X}\right) \neq \frac {1}{E(X)} \)

Variance of a Random Variable

The variance of random variable measures the spread (dispersion or variability) of the distribution about its mean. Mathematically,

$$ Var(X)=E(X^2 )-{E(X)}^2=E[{X-E[X]}^2] $$ Intuitively, the standard deviation is the square root of the variance. Now, denoting \(E(X)=\mu\), then: $$ Var(X)=E(X^2 )-\mu^2 $$

Example: Calculating the Variance of Random Variable

The continuous random variable X has a pdf of \(f(x)=12x^2 (1-x)\) for \(0 < x < 1\).

We need to calculate E[X].


We know that: $$ Var(X)=E(X^2 )-{E(X)}^2 $$ We had calculated E(X)=0.6

We have to calculate E(X^2 )

$$ \begin{align*} E(X^2 ) & =\int_0^1 x.[12x^2 (1-x)]dx={[4x^3-3x^4 ]}^{1}_{0}=0.6 \\ & =\int_0^1 12x^4-12x^5 dx={ \left[ \frac {12}{5} x^5-2x^6 \right] }^{1}_{0}=0.4 \\ \end{align*} $$ So, $$ Var(X)=0.4-0.6^2=0.04 $$


Moments are defined as the expected values that briefly describe the features of a distribution. The first moment is defined to be the expected value of X:

$$ \mu_1=E(X) $$ Therefore, the first moment provides the information about the average value. The second and higher moments are broadly divided into Central and Non-central moments

Central Moments

The general formula for the central moments is: $$ \mu_k=E([X-E(X)]^k ),k=2,3… $$ Where k denotes the order of the moment. Central moments are moments about the mean.

Non-Central Moments

Non-central moments describe those moments about 0. The general formula is given by: $$ \mu_r=E(X^k) $$ Note that the central moments are constructed from the non-central moments and the first central and non-central moments are equal \(( \mu_1=E(X))\).

Population Moments

The four common population moments are: mean, variance, skewness, and kurtosis.

The mean

The mean is the first moment and given by: $$ \mu=E(X) $$ It is the average (also called the location of the distribution) value of X.

The Variance

This is the second moment. It is presented as: $$ \sigma^2=E([X-E(X)]^2 )=E[(X-\mu)^2 ] $$ The variance measures the spread of the random variable from its mean. The standard deviation (\(\sigma\)) is the square root of the variance. The standard deviation is mostly reported instead of the variance because it is easily comparable to the mean since they share the measurement units.

The Skewness

The skewness is a cubed standardized central moment given by: $$ \text{skew}(X)=\cfrac { E([X-E(X)])^3 }{\sigma^3} =E \left[ \left( \cfrac {X-\mu}{\sigma} \right)^3 \right] $$ Note that \(\cfrac {X-\mu}{\sigma}\) is a standardized X with a mean of 0 and variance of 1.

The skewness measures the asymmetry of the distribution (since third power depends on the sign of the difference). When the value of the asymmetry is negative, there is a high probability of observing the large magnitude of negative value than positive values (tail is on left side of the distribution). Conversely, if the skewness is positive, there is high probability of observing large magnitude of positive values than negative values (tail is on the right side of the distribution).


The Kurtosis is defined as the fourth standardized moment given by: $$ \text{Kurt}(X)=\cfrac {E([X-E(X)]^4 }{\sigma^4} =E \left[ \left( \cfrac {X-\mu}{\sigma} \right)^4 \right] $$ The description of kurtosis is analogous to that of the Skewness only that the fourth power of the Kurtosis implies that it measures the absolute deviation of random variables. The reference value of a normally distributed random variable s 3. A random variable with Kurtosis exceeding 3 is termed to be heavily or fat-tailed.

Effect of Linear Transformation on Moments

Many random variables do not have a conventional scale, and some such as investment returns are expressed as proportions or percentages. These are examples of a linear transformation.

Effect on Mean and Variance

Now, let \(Y= \alpha + \beta x\) where \(\alpha\) and \(\beta\) constants. \(\alpha\) is referred to as location shift, and \(\beta\) is the sale. The mean of this linear transformation is: $$ E(Y)=E(\alpha + \beta x )=\alpha + \beta E(X) $$ The variance is given by: $$ \text{Var}(Y)=\text{Var}(\alpha + \beta x)=\beta^2 \text{Var}(Y)=\beta^2 \sigma^2 $$ The location shift \(\alpha\) does not affect the variance because variance is the measure of spread from the mean. The standard deviation of Y is given by: $$ \sqrt { \beta^2 \sigma^2 }=| \beta | \sigma $$ Therefore, \(\alpha\) does not affect the standard deviation and that the standard deviation is linear on \(\beta\).

Effect on Skewness and Kurtosis

Maintaining the linear transformation \(Y=\alpha + \beta x\), if \(\beta\) is positive (\(\beta > 0\)) then the skewness and kurtosis of Y and X are equal since both moments are defined on standardized quantities which bypass the effect of the location shift \(\alpha\) and the scaling factor \( \beta \). This can be seen as follows:

We know that:

$$ \text{skew}(X)=E \left[ \left( \cfrac {X-\mu}{\sigma} \right)^3 \right ] $$ Now, $$ \begin{align*} \text{skew}(Y) & =\cfrac {E([Y-E(Y)])^3 }{σ^3} =E \left[ \left( \cfrac {Y-E(Y)}{\sigma} \right)^3 \right] \\ & =E \left[ \left( \cfrac {\alpha + \beta X-(\alpha + \mu X)}{\beta \sigma } \right)^3 \right] \\ & =E \left[ \left( \cfrac {β(X-μ)}{βσ} \right)^3 \right]=E \left[ \left( \cfrac {X-μ}{σ} \right)^3 \right]=\text{Skew}(X) \\ \end{align*} $$ However, if \( \beta < 0 \), the magnitude of skewness of Y is the same as that of X but with opposite sign due to odd third power (This can be shown as above). On the other hand, the kurtosis is unaffected due to the fourth power, which an even number.

Quantiles and Modes

Just like any data, quantities such as the quantiles and the modes are used to describe the distribution.

The Quantiles

For a continuous variable X, the \(\alpha\)-quartile of X is the smallest number m such that: $$ Pr⁡(X < m)=\alpha $$ Where \( \alpha \epsilon [0,1] \)

For instance, if X is a continuous random variable, the median is defined to be the solution of:

$$ P(X < m)=\int_{-\infty}^{m} f_X (x)dx=0.5 $$ Similarly, the lower and upper quartile is such that \(P(X < Q_1 )=0.25\) and \(P(X < Q_3 )=0.75\)

The interquartile range (IQR), which is an alternative measure of spread. It is given by:

$$ \text{IQR}=Q_3-Q_1 $$ Example: Calculating the Quartiles of a PDF

The random variable X has a pdf given by:

$$ f_X (x)=3e^{-2x},x > 0 $$ Calculate the median of the distribution


Denote the median by m. Then m is such that: $$ P(X < m)=\int_0^m 3e^{-2x} dx=0.5 $$ So, $$ \begin{align*} & ={\left[-\frac {3}{2} e^{-2x} \right]}^{m}_{0}=0.5 \\ & =-\frac {3}{2} e^{-2m}+\frac {3}{2}=0.5 \\ \Rightarrow m & =-\frac {1}{2}×ln⁡ \frac {2}{3}=0.2027 \\ \end{align*} $$


The mode measures the common tendency, that is, the location of the most observed value of a random variable. In case of continuous random variable, the mode is represented by the highest point in the PDF.

Random variables could have more than one mode; bimodal, if the distribution has two modes, it is described as bimodal and multimodal if the modes are more than two modes. The graph below shows the shape of a bimodal and multimodal distribution.

frm-1-multimodal-distributionQuestion 1

If a random variable has a mean of 4 and a standard deviation of 2, calculate Var(3-4X)

  1. 29
  2. 30
  3. 32
  4. 35


The correct answer is C

Recall that: $$ \text{Var}(\alpha+ \beta x)=\beta^2 \text{Var}(Y) $$ So, $$ \text{Var}(3-4X)=(-4)^2 \text{Var}(X)=16 \text{ Var}(X) $$ But we are given that the standard deviation, is 2 implying that the variance is 4.


$$ \text{Var}(3-4X)=16×2=32 $$

Question 2

A continuous random variable pdf given by \(f_X (x)=ce^{-3x}\) for all \(x > 0\). Calculate Pr(X>6.5)

  1. 0.4532
  2. 0.4521
  3. 0.3321
  4. 0.3333


The correct answer is D

We need to find the constant c first. We know that: $$ \int_{-\infty}^\infty f(x)dx=1 $$ So, $$ \begin{align*} \int_0^\infty ce^{-3x} dx & =1=c{ \left[ -\frac {1}{3} e^{-3x} \right] }^{\infty}_{0}=c \left[ 0- – \frac {1}{3} \right ] =1 \\ & \Rightarrow c=3 \\ \end{align*} $$ Therefore, the PDF is \(f_X (x)=3e^{-3x}\) so that \(Pr(X > 6.5)\) is given by: $$ \begin{align*} \int_0^{6.5} ce^{-3x} dx & =1=c { \left[- \frac {1}{3} e^{-3x} \right]}^{6.5}_{0}=c \left[- \frac {1}{3} e^{-3×6.5}- – \frac {1}{3} \right] \\ & =0.3333 \\ \end{align*} $$