###### Introduction to Probability: Definitio ...

Probability is a measure of the likelihood that something will happen. We usually... **Read More**

This reading will teach you the tools and techniques used to organize, visualize, and describe data. In addition, you will learn how to convert data into useful information that analysts can use to make important investment decisions.

Data refer to a collection of facts such as numbers, characters, measurements, observations, audio recordings, and videos. Data can either be raw or formatted. Further, data can be classified as follows:

- Numerical versus categorical data.
- Cross-sectional versus time-series versus panel data.
- Structured versus unstructured data.

**Numerical data** represent values that can be measured or counted. Examples of numerical data include age, height, weight, or the number of shares in a portfolio.

Numerical (quantitative) data can be classified into two types: continuous and discrete data.

**Continuous data**: Data that can be measured on an infinite scale. Such data can take any numerical value in a specified range of values, no matter how small. Examples include measures of temperature (25.5 degrees Celsius), height (1.81 meters), length (25.256 meters), etc.**Discrete data**: Refer to quantitative data that can be counted and have a finite number of possible values. For example, days in a week (7 days), number of employees in a company (3 employees — we cannot have 3.5 workers).

**Categorical data** (also called qualitative data) represent qualitative outcomes (i.e., quality or characteristic) of a group of observations. The groups are mutually exclusive. This means that each individual fits only into one category. Examples of categorical data include investment style (i.e., growth vs. value stock) or investment-grade and junk bonds.

Categorical data are split into the following two types:

**Nominal data**: Categorical values that have no inherent numerical significance since they do not rank data. A good example would be gender representation, e.g., 1 to represent ‘male’ and 2 to represent ‘female.’**Ordinal data:**Categorical values that rank data according to some characteristics where each category has an ordered relationship to all the other categories. Although ordinal scales can rank data in some order, the magnitude of the difference in categories cannot be quantified or measured. A good example is a rating scale from 1 to 3, where 1 represents ok, 2 represents good, and 3 represents excellent. Although ordinal data can be ranked or ordered, they do not necessarily indicate a numerical difference between them.

Data can be classified into cross-sectional, time-series, and panel data depending on the data collection method employed.

**Cross-sectional data**: Refer to a set of observations made at a point in time. Samples are constructed by simultaneously collecting the data of interest across a range of observational units — people, objects, firms, etc. A good example of cross-sectional data is the stock returns Microsoft, IBM, and Samsung shareholders earned in the year ended, 31st December 2021.**Time-series data**: Refer to a set of observations made over a given period at specific and equally-spaced time intervals. That the observations are made at specific points in time shows that time intervals are discrete. A good example of time-series data could be the daily or weekly closing price of a stock recorded over a period spanning 13 weeks. Other appropriate examples could be the set of monthly profits (both positive and negative) Samsung earned between the 1st of October 2021 and the 1st of December 2021.**Panel data**: Comprise a combination of time series data and cross-sectional data. An Example of panel data can be studying the GDP of three developing countries for a period spanning three years, from 2019 to 2021.

A **variable** is a characteristic or quantity that can be measured, counted, or categorized and is subject to change. A variable can also be called a field, an attribute, or a feature. For example, stock price, market capitalization, dividend and dividend yield, earnings per share (EPS), and price-to-earnings ratio (P/E) are basic data variables for the financial analysis of a public company.

An **observation** is the value of a specific variable collected at a point in time or over a specified time period. For example, last year DEF Inc. recorded an EPS (earnings per share) of $7.50 — this could be our first observation related to the EPS variable. That value represented a 15% annual increase — this could be our second observation.

**Structured data** are organized in any pre-defined manner (i.e., rows and columns), and there is a relationship between different rows and columns. Since it is highly organized and formatted, it is easy to access, process, and store. It can work easily with most standard analytical models. Typical examples of structured company financial data include:

**Market data**: For example, daily trading yields of a bond, daily closing stock prices, bond prices, and trading volumes.**Fundamental data**: Data available in the financial statements, i.e., earnings per share, dividend per share, and gross margins.**Analytical data**: Data derived from analytics, i.e., forecast operating profit growth and forecast cash flow from operations.

**Unstructured data** are not organized in any pre-defined manner. They can be textual, numbers, dates, etc. Examples of unstructured data include financial news, posts on social media, company filings with a regulator, audio or video recordings, etc. Due to irregularities and disorganization within unstructured data, they are difficult to handle and understand. Unstructured data are usually collected from unconventional sources. Based on the source from which the unstructured data are sourced, they can be classified into the following three groups:

- Produced by individuals (i.e., via social media posts, web searches, etc.).
- Generated by business processes (i.e., via credit card transactions, corporate regulatory filings, etc.).
- Generated by sensors (i.e., via satellite imagery, foot traffic by mobile devices, etc.).

Identify the data type for each of the following items:

- Microsoft, IBM, and Samsung stock returns shareholders earned for the year ended 31
^{st}December, 2020. - Price change of a stock.
- The number of students in a class.
- Color of a smartphone.
- Grades of a student in a quiz.

**Solution**

**Cross-sectional data**: Microsoft, IBM, and Samsung stock returns shareholders earned for the year ended 31st December 2020.**Continuous data**: Price change of a stock.**Discrete data**: Number of students in a class.**Nominal data**: Color of a smartphone.**Ordinal data**: Grades of students in a quiz.

## Question

Which of the following is

most likelypanel data?

- Yearly remittances of five countries from Asia for the past 10 years.
- Customers’ online comments regarding the quality of a product of a company.
- Monthly profits a company earned from 1
^{st}of July 2019 to 30^{th}of June 2020.

SolutionThe correct answer is

A.Remember that panel data are a mix of time-series and cross-sectional data. Panel data consist of observations through time on one or more variable(s) for multiple observational units. The observations in panel data are usually organized in a matrix format called a data table.

Therefore, yearly remittances of five countries from Asia for the past 10 years qualify to be referred to as panel data.

B is incorrect. Customers’ online comments regarding the quality of product of a company qualify to be referred to as structured data.

C is incorrect. Monthly profits a company earned from 1^{st}of July 2019 to 30^{th}of June 2020 are examples of time series data.