Big data is a term used to refer to complex, extremely large data that may be analyzed computationally to reveal patterns, trends, and associations, especially those leading to human behavior. It encompasses both traditional data sources such as company reports, stock exchange sources, and data gathered from governments as well as nontraditional (alternative) data from social media, sensor networks, and electronic devices.
Defining properties of Big Data
- Volume: the amount of data collected in various forms, including files, records, tables, etc. Quantities of data reach almost incomprehensible proportions.
- Velocity: The speed of data processing can be extremely high. In most cases, we deal with real-time data.
- Variety: The number of types/formats of data. The data could be structured (e.g., SQL tables or CSV files), semi-structured (e.g., HTML code), or unstructured (e.g., video messages).
|MB||megabyte||1 million bytes|
|GB||gigabyte||1 billion bytes|
|TB||terabyte||1 trillion bytes|
|PB||petabyte||1 quadrillion bytes|
As can be seen, as more data are being generated, captured, and stored, data volumes are growing from megabytes (MB) and gigabytes (GB) to far larger sizes, such as terabytes (TB) and petabytes (PB). As this happens, more data, both traditional and nontraditional, are available on a real-time or near-real-time basis. At the same time, the variety also grows.
Structured data refers to information with a high degree of organization. Items can be organized in tables and are commonly stored in a database where each field represents the same type of information.
Unstructured data refers to information with a low degree of organization. Items are unorganized and cannot be presented in tabular form, such as text messages, tweets, and emails.
Semi-structured data may have the qualities of both structured and unstructured data.
Sources of data
- Financial markets: equity, swaps, futures, options, and other derivatives
- Businesses: financial statements, credit card purchases, and commercial transactions
- Governments: payroll, economic, trade, employment data, etc.
- Individuals: product reviews, credit card purchases, social media posits, etc.
- Sensors: shipping cargo information, traffic data, satellite imagery
- The Internet of Things: data generated by ‘smart ‘buildings through fittings such as CCTV cameras, vehicles, home appliances, etc.
Artificial Intelligence (AI) vs. Machine Learning
In broad terms, artificial intelligence refers to machines that can perform tasks in ways that are “intelligent.” It has much to do with the development of computer systems that exhibit cognitive and decision-making abilities comparable or superior to that of humans. It is the broader concept of machines being able to carry out tasks in a way that we would consider “smart”. AI can take the form of “if-then” statements or complex statistical models that map raw sensory data to symbolic categories.
Machine learning is a current application of AI which revolves around the idea that we should really just give machines access to data and let them learn by themselves without further human intervention. It’s the idea that when exposed to more data, machines can make changes on their own and come up with solutions to problems without reliance on human expertise, improving their performance over time.
Types of Machine Learning
Under supervised learning, computers learn to model data based on labeled training data that contains both the inputs and the desired outputs. Each training example has one or more inputs and a desired output.
Trying to predict the performance of a stock (up, down, or level) during the next business day can be modeled through supervised learning.
Under unsupervised learning, computers are only given input data and are tasked with describing the data, for instance by grouping or clustering of data points. The computers learn from data that has not been labeled or categorized. The computers then “react” based on the presence or absence of commonalities in the data.
Trying to group companies based on their financial characteristics and not on geographical or industrial characteristics would be a good example of unsupervised learning.
Machine learning most likely refers to:
A. The autonomous acquisition of knowledge through the use of computer programs
B. The ability of machines to execute coded instructions
C. The selective acquisition of knowledge through the use of computer programs
The correct answer is A.
Machine learning refers to the autonomous acquisition of knowledge through the use of computer programs such that the computers learn to work out solutions to problems without human intervention. Machine learning is the idea that computers have the ability to “learn” and execute changes independently.
Reading 43 LOS 43b:
Describe Big Data, artificial intelligence, and machine learning