{"id":12306,"date":"2021-03-09T09:17:47","date_gmt":"2021-03-09T09:17:47","guid":{"rendered":"https:\/\/analystprep.com\/study-notes\/?p=12306"},"modified":"2026-07-01T10:44:14","modified_gmt":"2026-07-01T10:44:14","slug":"preparing-wrangling-exploring-textual-data-financial-forecasting","status":"publish","type":"post","link":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/","title":{"rendered":"Preparing, Wrangling, and Exploring Textual Data for Financial Forecasting"},"content":{"rendered":"<p><script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"QAPage\",\n  \"mainEntity\": {\n    \"@type\": \"Question\",\n    \"name\": \"Which of the following visualizations is most appropriate during the exploratory data analysis step when the objective is to display the most informative words in a dataset based on their term frequency (TF) values?\",\n    \"answerCount\": 3,\n    \"acceptedAnswer\": {\n      \"@type\": \"Answer\",\n      \"text\": \"The correct answer is B. Word cloud. A word cloud is a visualization that displays words with sizes proportional to their term frequency, making it easy to identify the most common or informative words in a text dataset. Colors may also be used to convey additional information, such as word frequency or length.\"\n    },\n    \"suggestedAnswer\": [\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"Scatter plot.\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"Word cloud.\"\n      },\n      {\n        \"@type\": \"Answer\",\n        \"text\": \"Document term matrix.\"\n      }\n    ]\n  }\n}\n<\/script> <script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"ImageObject\",\n  \"url\": \"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1-1536x1493.jpg\",\n  \"contentUrl\": \"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1-1536x1493.jpg\",\n  \"caption\": \"Textual Data Preparation and Wrangling in Financial Forecasting\",\n  \"width\": 1536,\n  \"height\": 1493,\n  \"representativeOfPage\": true,\n  \"associatedArticle\": {\n    \"@type\": \"WebPage\",\n    \"@id\": \"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/\"\n  },\n  \"copyrightNotice\": \"\u00a9 2024 AnalystPrep\",\n  \"acquireLicensePage\": \"https:\/\/analystprep.com\/license-info\",\n  \"creditText\": \"AnalystPrep Design Team\",\n  \"creator\": {\n    \"@type\": \"Organization\",\n    \"name\": \"AnalystPrep\"\n  }\n}\n<\/script><\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/www.youtube.com\/embed\/ifHmwpgHWYY\" width=\"611\" height=\"344\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p>Sentiment analysis refers to the analysis of opinions or emotions from text data. In other words, it refers to how positive, negative, or neutral a particular phrase or statement is regarding a \u201ctarget.\u201d Such sentiment can provide critical predictive power for forecasting stock price movements for companies.<\/p>\n<p>Here, we delve into a financial forecasting project to examine how effectively text data sentiments, from financial and economic news sources, can be classified. This project uses sentences from a text file Sentences_50Agree labeled as positive, neutral, or negative sentiment class.<\/p>\n<p>The following table shows a sample data table corpus made up of 6 sentences from the Sentences_50Agree text file. A corpus is a collection of text data in any form, including lists, matrix, or data table forms.<\/p>\n<p>$$\\small{\\begin{array}{l|l} \\textbf{Sentence}&amp;\\textbf{Sentiment}\\\\ \\hline\\text{Cramo slipped to a pretax loss of EUR 6.7 million from a pretax profit of EUR 58.9 million.}&amp;\\text{Negative}\\\\ \\hline\\text{Profit before taxes decreased to EUR 31.6 mn from EUR 50.0 mn the year before.}&amp;\\text{Negative}\\\\ \\hline\\text{Also, construction expenses have gone up in Russia.}&amp;\\text{Negative}\\\\ \\hline{\\text{Finnish Bore that is owned by the Rettig family, has grown recently}\\\\ \\text{through the acquisition of smaller shipping companies.}}&amp;\\text{Positive}\\\\ \\hline{\\text{The plan is estimated to generate some EUR 5 million (USD 6.5 m) in}\\\\ \\text{cost savings on an annual basis.}}&amp;\\text{Positive}\\\\ \\hline\\text{Earnings per share EPS are seen at EUR 0.56}&amp;\\text{Positive}\\\\\u00a0 \\end{array}}$$<\/p>\n<div style=\"text-align: center; margin: 28px 0;\"><a style=\"display: inline-block; background: #1a73e8; color: #ffffff; padding: 12px 26px; border-radius: 40px; font-size: 16px; font-weight: 500; text-decoration: none; line-height: 1.4;\" href=\"https:\/\/analystprep.com\/free-trial\/\" target=\"_blank\" rel=\"noopener noreferrer\"> Learn Text Analytics with our Free Trial <\/a><\/div>\n<h2>Text Preparation\/Cleansing<\/h2>\n<p>Whenever the data is obtained from different sources, it is gathered in raw format, which is not feasible for analysis. The first step of converting the raw data into a proper format is data cleansing.<\/p>\n<p>Text preparation entails removing, or incorporating appropriate substitutions for, possible extraneous information present in the text. In this case, we will eliminate punctuations, numbers, and white spaces that are not necessary for model training as follows:<\/p>\n<p><em><strong>Step1:<\/strong> Remove HTML tags.<\/em><\/p>\n<p>There are no HTML tags present in this sample.<\/p>\n<p><em><strong>Step 2:<\/strong> Remove punctuations.<\/em><\/p>\n<p>Percentage and dollar symbols are substituted with word annotations to retain their importance in the textual data. Further, periods, semi-colons, and commas are removed. Regex is commonly applied to remove or replace punctuations.<\/p>\n<p><em><strong>Step3:<\/strong> Remove numbers. <\/em><\/p>\n<p>Numbers present in the text should be removed as they do not have significant use for sentiment analysis. It is crucial to note that sentiment analysis seeks to understand the context in which the numbers are used. Additionally, before removing numbers, abbreviations representing orders of magnitude, such as million (commonly represented by \u201cm,\u201d \u201cmln,\u201d or \u201cmn\u201d), billion, or trillion, should be replaced with the complete word.<\/p>\n<p><em><strong>Step 4:<\/strong> Remove white spaces<\/em><\/p>\n<p>Extra spaces such as tabs, line breaks, and new lines should be identified and removed to keep the text intact and clean. The stripWhitespace function in R can be utilized to can be used to eliminate unnecessary white spaces from the text<\/p>\n<p>The cleansed data is free of punctuations and numbers, with useful substitutions, as shown in the table below:<\/p>\n<p>$$\\small{\\begin{array}{l|l} \\textbf{Sentence}&amp;\\textbf{Sentiment}\\\\ \\hline\\text{Cramo slipped to a pretax loss of a EUR million from a pretax profit of EUR million}&amp;\\text{Negative}\\\\ \\hline\\text{Profit before taxes decreased to EUR million from EUR million the year before}&amp;\\text{Negative}\\\\ \\hline\\text{Also, construction expenses have gone up in Russia}&amp;\\text{Negative}\\\\ \\hline{\\text{Finnish Bore that is owned by the Rettig family has grown recently}\\\\ \\text{through the acquisition of smaller shipping companies}}&amp;\\text{Positive}\\\\ \\hline{\\text{The plan is estimated to generate some EUR million USD million in cost}\\\\ \\text{savings on an annual basis}}&amp;\\text{Positive}\\\\ \\hline\\text{Earnings per share EPS are seen at EUR}&amp;\\text{Positive}\\\\\u00a0 \\end{array}}$$<\/p>\n<h2>Text Wrangling<\/h2>\n<p>After textual data is cleansed, it should be normalized. The normalization process in text processing entails the following:<\/p>\n<p><em><strong>Step 1:<\/strong> Lowercasing<\/em> the alphabet aids the computer to process identical words appropriately. For example, \u201cAND,\u201d \u201cAnd,\u201d and \u201cand\u201d).<\/p>\n<p><em><strong>Step 2:<\/strong> Stop words<\/em> such as \u201cthe,\u201d \u201cfor,\u201d and \u201care,\u201d usually are removed to reduce the number of tokens involved in the training set for ML training purposes. However, some stop words such as <em>not<\/em>, <em>more<\/em>, <em>very<\/em>, and <em>few<\/em> are not eliminated as they carry significant meaning in the financial texts that are useful for sentiment prediction.<\/p>\n<p><em><strong>Step 3:<\/strong> Stemming<\/em> a process of linguistic normalization, which reduces words to their word root word. For example, connection, connected, connecting word reduce to a common word &#8220;connect.&#8221;<\/p>\n<p><em><strong>Step 4:<\/strong> Lemmatization<\/em> is identical to stemming except that it removes endings only if the base form is present in a dictionary. Lemmatization is much more costly and advanced relative to stemming.<\/p>\n<p><em><strong>Step 5:<\/strong> Tokenization <\/em>is the process of breaking down a text paragraph into smaller chunks, such as words. A token is a single entity that is a building block for a sentence or a paragraph.<\/p>\n<h2>Data Exploration<\/h2>\n<h3>Exploratory Data Analysis<\/h3>\n<p>Exploratory data analysis (EDA) performed on text data provides insights on word distribution in the text. These word counts can be used to examine words that are most commonly and least commonly present in the texts, i.e., outliers.<\/p>\n<p>The graph below is a frequency distribution of the sample tokens before stop words are eliminated.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14939\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg\" alt=\"frequency distribution\" width=\"1590\" height=\"1546\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg 1590w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1-300x292.jpg 300w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1-1024x996.jpg 1024w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1-768x747.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1-1536x1493.jpg 1536w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1-400x389.jpg 400w\" sizes=\"auto, (max-width: 1590px) 100vw, 1590px\" \/>From the figure, we observe that \u201ccurrencysign,\u201d and \u201cmillion\u201d are the most repeated words due to the financial nature of the data.<\/p>\n<p>A word cloud is a data visualization approach used for representing text data. Word clouds can be made to visualize the most informative words and their term frequency (TF) values. Varying font sizes can show the most commonly occurring words. Further, color is used to add more dimensions, such as frequency and length of words.<\/p>\n<p>The feature selection process will eliminate common words and highlight useful words for better model training.<\/p>\n<blockquote>\n<h2>Question<\/h2>\n<p>Which of the following visualizations is <em>most likely<\/em> to be appropriate in the exploratory data analysis step if our objective is to create a visualization that shows the most informative words in the dataset based on their term frequency (TF) values?<\/p>\n<p>\u00a0 \u00a0 A. Scatter plot.<\/p>\n<p>\u00a0 \u00a0 B. Word cloud.<\/p>\n<p>\u00a0 \u00a0 C. Document term matrix.<\/p>\n<h3>Solution<\/h3>\n<p><strong>The correct answer is B.<\/strong><\/p>\n<p>A word cloud is a common visualization when working with text data as it can be made to visualize the most informative words and their TF values. Varying font sizes can show the most commonly occurring words in the dataset, and the color is used to add more dimensions, such as frequency and length of words.<\/p>\n<p><strong>A is incorrect.<\/strong>\u00a0 A scatter plot is a two-dimensional chart that can be employed to summarize and approximately measure relationships between two or more features.<\/p>\n<p><strong>C is incorrect.<\/strong>\u00a0A document term matrix (DTM) is a matrix where each row belongs to a text file, and each column represents a token. The number of rows is equivalent to the number of text files in a sample text dataset. The number of columns is equal to the number of tokens from the BOW built using all the text files in the same token is present in each document. A DTM is not a visualization tool.<\/p>\n<\/blockquote>\n<p>Reading 7: Big Data Projects<\/p>\n<p><em>LOS 7 (e) Describe preparing, wrangling, and exploring text-based data for financial forecasting<\/em><\/p>\n<div style=\"background: #f5f7fb; padding: 24px 18px; border-radius: 12px; text-align: center; margin: 36px 0 18px;\"><a style=\"display: inline-block; background: #1a73e8; color: #ffffff; padding: 10px 24px; border-radius: 40px; font-size: 16px; font-weight: bold; text-decoration: none; margin-bottom: 16px;\" href=\"https:\/\/analystprep.com\/free-trial\/\" target=\"_blank\" rel=\"noopener noreferrer\"> Start Free Trial \u2192 <\/a><\/p>\n<div style=\"font-size: 14px; color: #333333; max-width: 650px; margin: 0 auto; line-height: 1.6;\">Master text preparation, data cleansing, sentiment analysis, natural language processing, and financial text analytics with CFA Level II exam-style practice.<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Sentiment analysis refers to the analysis of opinions or emotions from text data. In other words, it refers to how positive, negative, or neutral a particular phrase or statement is regarding a \u201ctarget.\u201d Such sentiment can provide critical predictive power&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[102,229],"tags":[216,270,268,230,269],"class_list":["post-12306","post","type-post","status-publish","format-standard","hentry","category-cfa-level-2","category-quantitative-method","tag-cfa-level-2","tag-exploring-textual-data-for-financial-forecasting","tag-preparing-textual-data-for-financial-forecasting","tag-quantitative-method","tag-wrangling-textual-data-for-financial-forecasting","blog-post","no-post-thumbnail","animate"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Preparing &amp; Exploring Textual Data | CFA Level II<\/title>\n<meta name=\"description\" content=\"Learn text wrangling and EDA techniques, including word frequency distributions and word clouds, to analyze and visualize financial textual data.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Preparing &amp; Exploring Textual Data | CFA Level II\" \/>\n<meta property=\"og:description\" content=\"Learn text wrangling and EDA techniques, including word frequency distributions and word clouds, to analyze and visualize financial textual data.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/\" \/>\n<meta property=\"og:site_name\" content=\"CFA, FRM, and Actuarial Exams Study Notes\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-09T09:17:47+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-07-01T10:44:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1590\" \/>\n\t<meta property=\"og:image:height\" content=\"1546\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Irene R\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Irene R\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/\"},\"author\":{\"name\":\"Irene R\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/7002f30d8f174958802c1c30b167eaf5\"},\"headline\":\"Preparing, Wrangling, and Exploring Textual Data for Financial Forecasting\",\"datePublished\":\"2021-03-09T09:17:47+00:00\",\"dateModified\":\"2026-07-01T10:44:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/\"},\"wordCount\":1246,\"image\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_110-1.jpg\",\"keywords\":[\"CFA-level-2\",\"Exploring Textual Data for Financial Forecasting\",\"Preparing Textual Data for Financial Forecasting\",\"Quantitative Method\",\"Wrangling Textual Data for Financial Forecasting\"],\"articleSection\":[\"CFA Level II Study Notes\",\"Quantitative Method\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/\",\"name\":\"Preparing & Exploring Textual Data | CFA Level II\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_110-1.jpg\",\"datePublished\":\"2021-03-09T09:17:47+00:00\",\"dateModified\":\"2026-07-01T10:44:14+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/7002f30d8f174958802c1c30b167eaf5\"},\"description\":\"Learn text wrangling and EDA techniques, including word frequency distributions and word clouds, to analyze and visualize financial textual data.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/#primaryimage\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_110-1.jpg\",\"contentUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_110-1.jpg\",\"width\":1590,\"height\":1546,\"caption\":\"frequency distribution\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/preparing-wrangling-exploring-textual-data-financial-forecasting\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Preparing, Wrangling, and Exploring Textual Data for Financial Forecasting\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#website\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/\",\"name\":\"CFA, FRM, and Actuarial Exams Study Notes\",\"description\":\"Question Bank and Study Notes for the CFA, FRM, and Actuarial exams\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/7002f30d8f174958802c1c30b167eaf5\",\"name\":\"Irene R\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g\",\"caption\":\"Irene R\"},\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/author\\\/irene\\\/\"}]}<\/script>\n<meta property=\"og:video\" content=\"https:\/\/www.youtube.com\/embed\/ifHmwpgHWYY\" \/>\n<meta property=\"og:video:type\" content=\"text\/html\" \/>\n<meta property=\"og:video:duration\" content=\"3468\" \/>\n<meta property=\"og:video:width\" content=\"480\" \/>\n<meta property=\"og:video:height\" content=\"270\" \/>\n<meta property=\"ya:ovs:adult\" content=\"false\" \/>\n<meta property=\"ya:ovs:upload_date\" content=\"2021-03-09T09:17:47+00:00\" \/>\n<meta property=\"ya:ovs:allow_embed\" content=\"true\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Preparing & Exploring Textual Data | CFA Level II","description":"Learn text wrangling and EDA techniques, including word frequency distributions and word clouds, to analyze and visualize financial textual data.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/","og_locale":"en_US","og_type":"article","og_title":"Preparing & Exploring Textual Data | CFA Level II","og_description":"Learn text wrangling and EDA techniques, including word frequency distributions and word clouds, to analyze and visualize financial textual data.","og_url":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/","og_site_name":"CFA, FRM, and Actuarial Exams Study Notes","article_published_time":"2021-03-09T09:17:47+00:00","article_modified_time":"2026-07-01T10:44:14+00:00","og_image":[{"width":1590,"height":1546,"url":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg","type":"image\/jpeg"}],"author":"Irene R","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Irene R","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/#article","isPartOf":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/"},"author":{"name":"Irene R","@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/7002f30d8f174958802c1c30b167eaf5"},"headline":"Preparing, Wrangling, and Exploring Textual Data for Financial Forecasting","datePublished":"2021-03-09T09:17:47+00:00","dateModified":"2026-07-01T10:44:14+00:00","mainEntityOfPage":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/"},"wordCount":1246,"image":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/#primaryimage"},"thumbnailUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg","keywords":["CFA-level-2","Exploring Textual Data for Financial Forecasting","Preparing Textual Data for Financial Forecasting","Quantitative Method","Wrangling Textual Data for Financial Forecasting"],"articleSection":["CFA Level II Study Notes","Quantitative Method"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/","url":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/","name":"Preparing & Exploring Textual Data | CFA Level II","isPartOf":{"@id":"https:\/\/analystprep.com\/study-notes\/#website"},"primaryImageOfPage":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/#primaryimage"},"image":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/#primaryimage"},"thumbnailUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg","datePublished":"2021-03-09T09:17:47+00:00","dateModified":"2026-07-01T10:44:14+00:00","author":{"@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/7002f30d8f174958802c1c30b167eaf5"},"description":"Learn text wrangling and EDA techniques, including word frequency distributions and word clouds, to analyze and visualize financial textual data.","breadcrumb":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/#primaryimage","url":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg","contentUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1.jpg","width":1590,"height":1546,"caption":"frequency distribution"},{"@type":"BreadcrumbList","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/preparing-wrangling-exploring-textual-data-financial-forecasting\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/analystprep.com\/study-notes\/"},{"@type":"ListItem","position":2,"name":"Preparing, Wrangling, and Exploring Textual Data for Financial Forecasting"}]},{"@type":"WebSite","@id":"https:\/\/analystprep.com\/study-notes\/#website","url":"https:\/\/analystprep.com\/study-notes\/","name":"CFA, FRM, and Actuarial Exams Study Notes","description":"Question Bank and Study Notes for the CFA, FRM, and Actuarial exams","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/analystprep.com\/study-notes\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/7002f30d8f174958802c1c30b167eaf5","name":"Irene R","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g","caption":"Irene R"},"url":"https:\/\/analystprep.com\/study-notes\/author\/irene\/"}]},"og_video":"https:\/\/www.youtube.com\/embed\/ifHmwpgHWYY","og_video_type":"text\/html","og_video_duration":"3468","og_video_width":"480","og_video_height":"270","ya_ovs_adult":"false","ya_ovs_upload_date":"2021-03-09T09:17:47+00:00","ya_ovs_allow_embed":"true"},"_links":{"self":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/12306","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/comments?post=12306"}],"version-history":[{"count":35,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/12306\/revisions"}],"predecessor-version":[{"id":44734,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/12306\/revisions\/44734"}],"wp:attachment":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/media?parent=12306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/categories?post=12306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/tags?post=12306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}