Describe Big Data, Artificial Intelligence, and Machine Learning

Describe Big Data, Artificial Intelligence, and Machine Learning


Big data is a term used to refer to complex, extremely large data that may be analyzed computationally to reveal patterns, trends, and associations, especially those motivated by human behavior. It encompasses both traditional data sources such as company reports, stock exchange sources, and data gathered from governments as well as nontraditional (alternative) data from social media, sensor networks, and electronic devices.

Defining Properties of Big Data

  • Volume: the amount of data collected in various forms, including files, records, tables, etc. Quantities of data reach almost incomprehensible proportions.
  • Velocity: The speed of data processing can be extremely high. In most cases, we deal with real-time data.
  • Variety: The number of types/formats of data. The data could be structured (e.g., SQL tables or CSV files), semi-structured (e.g., HTML code), or unstructured (e.g., video messages).
$$ \begin{array}{c|c|c} \text{MB} & \text{Megabyte} & \text{1 million bytes} \\ \text{GB} & \text{Gigabyte} & \text{1 billion bytes} \\ \text{TB} & \text{Terabyte} & \text{1 trillion bytes} \\ \text{PB} & \text{Petabyte} & \text{1 quadrillion bytes} \\ \end{array} $$ As can be seen, as more data are generated, captured, and stored, data volumes are growing from megabytes (MB) and gigabytes (GB) to far larger sizes, such as terabytes (TB) and petabytes (PB). As this happens, more data, both traditional and nontraditional, are available on a real-time or near-real-time basis. At the same time, the variety also grows. Structured data refers to information with a high degree of organization. Items can be organized in tables and are commonly stored in a database where each field represents the same type of information. Unstructured data refers to information with a low degree of organization. Items are unorganized and cannot be presented in tabular forms, such as text messages, tweets, and emails. Semi-structured data may have the qualities of both structured and unstructured data.

Sources of Data

  • Financial markets: equity, swaps, futures, options, and other derivatives
  • Businesses: financial statements, credit card purchases, and commercial transactions
  • Governments: payroll, economics, trade, employment data, etc.
  • Individuals: product reviews, credit card purchases, social media posts, etc.
  • Sensors: shipping cargo information, traffic data, satellite imagery
  • The Internet of Things: data generated by ‘smart‘ buildings through fittings such as CCTV cameras, vehicles, home appliances, etc.

Challenges of Big Data

  • Data quality: data quality issues include selection bias, missing data, and outliers.
  • Volume of the Data: questions arise on the sufficiency of the data volume.
  • Appropriateness of the data: dataset may not be suited for the investment analysis. As such, data might be correctly sourced, cleansed, and organized prior to analysis (which is particularly difficult with alternative data).

Artificial Intelligence (AI) vs. Machine Learning

In broad terms, artificial intelligence refers to machines that can perform tasks in ways that are “intelligent.” It has much to do with the development of computer systems that exhibit cognitive and decision-making abilities comparable or superior to that of humans. It is the broader concept of machines being able to carry out tasks in a way that we would consider “smart.” AI can take the form of “if-then” statements or complex statistical models that map raw sensory data to symbolic categories. Machine learning is a current application of AI that draws knowledge from a large amount of data without making assumptions about the probability distribution of the data. It’s the idea that when exposed to huge amounts of data, machines can make changes on their own and come up with solutions to problems without reliance on human expertise. In machine learning, a computer algorithm is provided with inputs which can be in form of a set of variables or datasets, or outputs, which is basically the target data.  The algorithm then learns from the data given how to effectively model inputs into an output or give the best identification and description of a data structure if no output is provided. The algorithm learns by identifying the relationships in the data and then uses this information to improve its learning process. The ML divides the dataset into three unique types: a training dataset, a validation dataset, and a test dataset. A training dataset allows the algorithm to identify the link between inputs and outputs based on the historical pattern in the data. These relationships are then validated, and the model is adjusted using the validation dataset. As the name suggests, the test dataset is used to test the strength of the model in predicting well on the new data. Note that machine learning still needs human intervention in understanding the underlying data and choosing suitable techniques for data analysis. In other words, before data is utilized, it must be cleaned and free from bias and spurious data.

Types of Machine Learning

Supervised Learning

Under supervised learning, computers learn to model data based on labeled training data that contains both the inputs and the desired outputs. Each training example has one or more inputs and a desired output. Trying to predict the performance of a stock (up, down, or level) during the next business day can be modeled through supervised learning.

Unsupervised Learning

Under unsupervised learning, computers are only given input data and tasked with describing the data, for instance, by grouping or clustering data points. In this instance, computers learn from data that has not been labeled or categorized. The computers then “react” based on the presence or absence of commonalities in the data. Trying to group companies based on their financial and not geographical or industrial characteristics would be a good example of unsupervised learning.


Machine learning most likely refers to: A. The autonomous acquisition of knowledge through the use of computer programs. B. The ability of machines to execute coded instructions. C. The selective acquisition of knowledge through the use of computer programs. Solution The correct answer is A. Machine learning refers to the autonomous acquisition of knowledge through the use of computer programs such that computers learn to work out solutions to problems without human intervention. Machine learning is the idea that computers have the ability to “learn” and execute changes independently.
Shop CFA® Exam Prep

Offered by AnalystPrep

Featured Shop FRM® Exam Prep Learn with Us

    Subscribe to our newsletter and keep up with the latest and greatest tips for success
    Shop Actuarial Exams Prep Shop Graduate Admission Exam Prep

    Sergio Torrico
    Sergio Torrico
    Excelente para el FRM 2 Escribo esta revisión en español para los hispanohablantes, soy de Bolivia, y utilicé AnalystPrep para dudas y consultas sobre mi preparación para el FRM nivel 2 (lo tomé una sola vez y aprobé muy bien), siempre tuve un soporte claro, directo y rápido, el material sale rápido cuando hay cambios en el temario de GARP, y los ejercicios y exámenes son muy útiles para practicar.
    So helpful. I have been using the videos to prepare for the CFA Level II exam. The videos signpost the reading contents, explain the concepts and provide additional context for specific concepts. The fun light-hearted analogies are also a welcome break to some very dry content. I usually watch the videos before going into more in-depth reading and they are a good way to avoid being overwhelmed by the sheer volume of content when you look at the readings.
    Kriti Dhawan
    Kriti Dhawan
    A great curriculum provider. James sir explains the concept so well that rather than memorising it, you tend to intuitively understand and absorb them. Thank you ! Grateful I saw this at the right time for my CFA prep.
    nikhil kumar
    nikhil kumar
    Very well explained and gives a great insight about topics in a very short time. Glad to have found Professor Forjan's lectures.
    Great support throughout the course by the team, did not feel neglected
    Benjamin anonymous
    Benjamin anonymous
    I loved using AnalystPrep for FRM. QBank is huge, videos are great. Would recommend to a friend
    Daniel Glyn
    Daniel Glyn
    I have finished my FRM1 thanks to AnalystPrep. And now using AnalystPrep for my FRM2 preparation. Professor Forjan is brilliant. He gives such good explanations and analogies. And more than anything makes learning fun. A big thank you to Analystprep and Professor Forjan. 5 stars all the way!
    michael walshe
    michael walshe
    Professor James' videos are excellent for understanding the underlying theories behind financial engineering / financial analysis. The AnalystPrep videos were better than any of the others that I searched through on YouTube for providing a clear explanation of some concepts, such as Portfolio theory, CAPM, and Arbitrage Pricing theory. Watching these cleared up many of the unclarities I had in my head. Highly recommended.