{"id":12398,"date":"2021-03-09T21:43:04","date_gmt":"2021-03-09T21:43:04","guid":{"rendered":"https:\/\/analystprep.com\/study-notes\/?p=12398"},"modified":"2024-04-02T17:15:17","modified_gmt":"2024-04-02T17:15:17","slug":"evaluating-fit-machine-learning-algorithm","status":"publish","type":"post","link":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/","title":{"rendered":"Evaluating the Fit of a Machine Learning Algorithm"},"content":{"rendered":"<h3 id=\"mce_22\" class=\"editor-rich-text__tinymce mce-content-body\" data-is-placeholder-visible=\"false\">[vsw id=&#8221;ifHmwpgHWYY&#8221; source=&#8221;youtube&#8221; width=&#8221;611&#8243; height=&#8221;344&#8243; autoplay=&#8221;no&#8221;]<\/h3>\n<h2>Model Training<\/h2>\n<p>Suppose that the target variable (y) for the ML training model has the sentiment class labels (positive and negative). To ease the calculation of the performance metrics, we relabel them as 1 (for positive) and 0 (for negative). The performance metrics that can be employed here are receiver operating characteristic (ROC) curve and area under the curve (AUC) from the trained model results.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14934\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg\" alt=\"Receiver Operating Characteristics\" width=\"1590\" height=\"1002\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg 1590w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44-300x189.jpg 300w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44-1024x645.jpg 1024w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44-768x484.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44-1536x968.jpg 1536w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44-400x252.jpg 400w\" sizes=\"auto, (max-width: 1590px) 100vw, 1590px\" \/>The cleansed and preprocessed data set is split into three separate sets: training set, cross-validation (CV) set, and test set in the ratio 60:20:20, respectively.<\/p>\n<p>The final document term matrix (DTM) is built using the sentences (rows), which are the instances, and resulting tokens (columns), which are the feature variables, from the BOW of the training dataset. The columns of DTMs for the splits are the same, equal to the number of unique tokens (i.e., features) from the final training corpus BOW.<\/p>\n<p>A summary of the three data splits is shown in the following table:<\/p>\n<p>$$\\small{\\begin{array}{|c|c|c|}\\hline\\textbf{Corpus} &amp; \\textbf{Split} (\\%) &amp; \\textbf{Use} \\\\ \\hline\\text{Master} &amp; 100\\% &amp; \\text{Data exploration} \\\\ \\hline\\text{Training} &amp; 60\\% &amp; \\text{ML Model training} \\\\ \\hline\\text{Cross-validation} &amp; 20\\% &amp; {\\text{Turning and validating}\\\\ \\text{the trained model}} \\\\ \\hline\\text{Test} &amp; 20\\% &amp; {\\text{Testing the trained,}\\\\ \\text{tuned, and validated model}}\\\\ \\hline\\end{array}}$$<\/p>\n<h2>Method Selection<\/h2>\n<p>Sentiment analysis involves the use of textual data, which is more extensive with many possible variables and a known output, i.e., positive, neutral, or negative sentiment classes. A supervised ML algorithm, therefore, applies to sentiment analysis. ML algorithms such as SVM, decision trees, and logistic regression can also be examined in this case. For simplicity, we will assume that logistic regression is used to train the model.<\/p>\n<p>In this case, we assume that texts are the sentences, and the classifications are either positive or negative sentiment classes (labeled 1 and 0, respectively). The tokens are feature variables, and the sentiment class is the target variable.<\/p>\n<p>Logistic regression utilizes the method of maximum likelihood estimation. As a result, the output of the logistic model is a probability value ranging from 0 to 1. A mathematical function uses the logistic regression coefficient (\u03b2) to calculate the probability (p) of sentences having positive sentiment (y = 1). For example, if the p-value for a sentence is 0.90, there is a 90% probability that the sentence has a positive sentiment. We aim to find an ideal threshold value of p, which is a cutoff point for p values, and it depends on the dataset and model training.<\/p>\n<p>When the p values are above the ideal threshold p-value, the sentences have a high probability of having a positive sentiment (y = 1). The ideal threshold p-value is estimated heuristically using performance metrics and ROC curves.<\/p>\n<h2>Performance Evaluation and Tuning<\/h2>\n<p>This step involves predicting the sentiments of the sentences in the training and cross-validation (CV) DTMs using the trained ML model. ROC curves are plotted for model evaluation. Recall that for ROC curves, the x-axis is a false positive rate, \\(\\frac{FP}{TN+FP}\\), and the y-axis is a true positive rate, \\(\\frac{TP}{TP+FN}\\).<\/p>\n<p>The following figure shows sample ROC Curves of model results for Training and cross-validation data:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14943\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_35.jpg\" alt=\"F1 score and accuracy performance measures and CV Data\" width=\"1590\" height=\"1129\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_35.jpg 1590w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_35-300x213.jpg 300w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_35-1024x727.jpg 1024w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_35-768x545.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_35-1536x1091.jpg 1536w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_35-400x284.jpg 400w\" sizes=\"auto, (max-width: 1590px) 100vw, 1590px\" \/>The ROC curves are significantly different between the training and the CV datasets, i.e., the AUC is 96.5% on training data and 86.2% on CV data. This finding suggests that the model performs relatively poorly on the CV data when compared to training data, implying that the model is overfitted.<\/p>\n<p>Regularization approaches such as least absolute shrinkage and selection operator (LASSO) are applied to the logistic regression model to mitigate the risk of overfitting. LASSO penalizes the coefficients of the model to prevent overfitting of the model. The penalized model only selects the tokens with statistically non-zero coefficients, and that contributes to the model fit.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14944\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_30.jpg\" alt=\"F1 score and accuracy performance measures and CV Data\" width=\"1590\" height=\"1129\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_30.jpg 1590w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_30-300x213.jpg 300w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_30-1024x727.jpg 1024w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_30-768x545.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_30-1536x1091.jpg 1536w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_30-400x284.jpg 400w\" sizes=\"auto, (max-width: 1590px) 100vw, 1590px\" \/>The ROC curves are similar for model performance on both datasets, with an AUC of 95.7% on the training dataset and 94.8% on the CV dataset. This implies that the model performs almost identically on both training and CV data and thus indicates an excellent fitting model.<\/p>\n<p>Error analysis can be used to evaluate the model further. It is performed by computing a confusion matrix using the ML model results from the CV dataset. The threshold p-value of 0.5 is the theoretically suggested threshold (cutoff). When p &gt; 0.5, the prediction is assumed to be y = 1 (positive sentiment).<\/p>\n<h6 style=\"text-align: center;\">Confusion Matrix for CV Data with threshold p-value of 0.5<\/h6>\n<p>$$\\small{\\begin{array}{l|l|ll} {}&amp; {}&amp; \\textbf{Actual Training Labels}&amp;{}\\\\ \\hline{}&amp;{} &amp; \\text{Class \u201c1.\u201d} &amp; \\text{Class \u201c0.\u201d}\\\\ \\hline\\textbf{Predicted Results} &amp; \\text{Class \u201c1\u201d} &amp; \\text{TP} = 256 &amp; \\text{FP} = 27\\\\ {}&amp; \\text{Class \u201c0\u201d} &amp; \\text{FN} = 6 &amp; \\text{TN} = 103\\\\\u00a0 \\end{array}}$$<\/p>\n<h4>Performance Metrics<\/h4>\n<p>$$\\begin{align*}\\text{Precision (P)}&amp;=\\frac{\\text{TP}}{\\text{TP}+\\text{FP}}\\\\&amp;=\\frac{256}{256+27}=0.9046\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Recall (R)}&amp;=\\frac{\\text{TP}}{\\text{TP}+\\text{FN}}\\\\&amp;=\\frac{256}{256+6}=0.9771\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{F1 score}&amp;=\\frac{(2\\times\\text{Precision}\\times\\text{Recall})}{\\text{Precision}+\\text{Recall}}\\\\&amp;=\\frac{2\\times0.9046\\times0.9771}{0.9046+0.9771}\\\\&amp;=0.9394\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Accuracy}&amp;=\\frac{TP+TN}{TP+FP+TN+FN}\\\\&amp;=\\frac{256+103}{256+27+6+103}\\\\&amp;=0.9158\\end{align*}$$<\/p>\n<p>In this case, the model accuracy is 91.58%, with a theoretically suggested threshold p-value of 0.5. It is a standard practice to simulate many model results with different threshold values and to look for maximized accuracy and F1 statistics that minimize trade-offs between FP (Type I error) and FN (Type II error). What follows is to identify the threshold p-value that generates the highest accuracy and F1 score.<\/p>\n<p>For example, the following figure shows the F1 score and accuracy performance measures for various threshold p values. The threshold p-value that results in the highest accuracy and F1 score can be identified as 0.6.<\/p>\n<h6 style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14946\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_45.jpg\" alt=\"F1 Score and Accuracy Performance Measures\" width=\"1590\" height=\"1065\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_45.jpg 1590w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_45-300x201.jpg 300w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_45-1024x686.jpg 1024w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_45-768x514.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_45-1536x1029.jpg 1536w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_45-400x268.jpg 400w\" sizes=\"auto, (max-width: 1590px) 100vw, 1590px\" \/>Confusion Matrix for CV Data with threshold p-value of 0.6<\/h6>\n<p>$$\\small{\\begin{array}{l|l|l|l} {}&amp;{} &amp;\\textbf{Actual Training Labels}&amp;{}\\\\ \\hline{}&amp; {}&amp; \\text{Class \u201c1.\u201d} &amp; \\text{Class \u201c0.\u201d}\\\\ \\hline\\textbf{Predicted Results} &amp; \\text{Class \u201c1\u201d} &amp; \\text{TP} = 256 &amp; \\text{FP} = 19 \\\\{} &amp; \\text{Class \u201c0\u201d} &amp; \\text{FN} = 6 &amp; \\text{TN} = 100\\\\\u00a0 \\end{array}}$$<\/p>\n<h4>Performance Metrics<\/h4>\n<p>$$\\begin{align*}\\text{Precision (P)}&amp;=\\frac{256}{256+19}\\\\&amp;=0.9309\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Recall (R)}&amp;=\\frac{256}{256+6}\\\\&amp;=0.9771\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{F1 score}&amp;=\\frac{2\\times0.9309\\times0.9771}{0.9309+0.9771}\\\\&amp;=0.9534\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Accuracy}&amp;=\\frac{256+100}{256+19+6+100}\\\\&amp;=0.9344\\end{align*}$$<\/p>\n<p>The model performance metrics improved in the final model relative to the earlier case when the threshold p-value was 0.5. With a threshold p-value of 0.6, the precision, accuracy, and F1 score have increased. The precision has increased by approximately 3% to 93%, while the accuracy and F1 score both increasing by approximately 1% to 93% and 95%, respectively.<\/p>\n<h3>Results and Interpretation<\/h3>\n<p>The final ML model, with the appropriate threshold p-value, has been validated and is now ready for application. It can be used to predict the sentiment of new sentences from the test data corpus as well as new sentences from almost identical financial text data sources. The model is now applied to the test data. Note that the test data has neither been used to train the model nor validate it.<\/p>\n<p>The confusion matrix for the test data using a threshold p-value of 0.6 is as shown:<\/p>\n<h6 style=\"text-align: center;\">Confusion Matrix for Test Data with Threshold p-value of 0.6<\/h6>\n<p>$$\\small{\\begin{array}{l|l|l|l} {}&amp;{} &amp;\\textbf{Actual Training Labels}&amp;{}\\\\ \\hline{}&amp; {}&amp; \\text{Class \u201c1.\u201d} &amp; \\text{Class \u201c0.\u201d}\\\\ \\hline\\textbf{Predicted Results} &amp; \\text{Class \u201c1\u201d} &amp; \\text{TP} = 256 &amp; \\text{FP} = 24 \\\\{} &amp; \\text{Class \u201c0\u201d} &amp; \\text{FN} = 6 &amp; \\text{TN} = 107\\\\\u00a0 \\end{array}}$$<\/p>\n<h4>Performance Metrics<\/h4>\n<p>$$\\begin{align*}\\text{Precision (P)}&amp;=\\frac{256}{256+24}\\\\&amp;=0.9413\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Recall (R)}=\\frac{256}{256+6}==0.9771\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{F1 score}&amp;=\\frac{2\\times0.9413\\times0.9771}{0.9413+0.9771}\\\\&amp;=0.9313\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Accuracy}&amp;=\\frac{256+107}{256+24+6+107}\\\\&amp;=0.9237\\end{align*}$$<\/p>\n<p>From the confusion matrix for the test data, we get accuracy and F1 score of 92% and 93%, respectively, with precision and recall of 94% and 98%, respectively. Therefore, it is clear that the model performs similarly on the training, CV, and test datasets. Additionally, this implies that the model is robust and is not overfitting. It also suggests that the model generalizes well out-of-sample and can thus be used to predict the sentiment classes for new sentences from similar financial text data sources.<\/p>\n<p>The following figure recaps the entire section of classifying and predicting sentiments using textual data from financial data sources.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14930\" src=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-scaled.jpg\" alt=\"Example: Spliting the Data\" width=\"1416\" height=\"2048\" srcset=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-scaled.jpg 1416w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-207x300.jpg 207w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-708x1024.jpg 708w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-768x1111.jpg 768w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-1062x1536.jpg 1062w, https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_110-400x578.jpg 400w\" sizes=\"auto, (max-width: 1416px) 100vw, 1416px\" \/>The derived sentiment classification can be used to visualize insights regarding a large text without reading the entire document.<\/p>\n<blockquote>\n<h2>Question<\/h2>\n<p>In the previous analysis using the cross-validation dataset, the threshold value of 0.60 was determined to be the p-value that maximizes model accuracy and F1 score; the confusion matrix for this model is shown in Table A below:<\/p>\n<h6 style=\"text-align: center;\">Table A: Confusion Matrix for CV Data with Threshold p-value of 0.6<\/h6>\n<p>$$\\small{\\begin{array}{l|l|l|l} {}&amp;{} &amp;\\textbf{Actual Training Labels}&amp;{}\\\\ \\hline{}&amp; {}&amp; \\text{Class \u201c1.\u201d} &amp; \\text{Class \u201c0.\u201d}\\\\ \\hline\\textbf{Predicted Results} &amp; \\text{Class \u201c1\u201d} &amp; \\text{TP} = 256 &amp; \\text{FP} = 19 \\\\{} &amp; \\text{Class \u201c0\u201d} &amp; \\text{FN} = 6 &amp; \\text{TN} = 100\\\\\u00a0 \\end{array}}$$<\/p>\n<p>You are given the following confusion matrix of the same model with threshold p values of 0.4, as shown in Table B below:<\/p>\n<h6 style=\"text-align: center;\">Table B: Confusion Matrix for CV Data with Threshold p-value of 0.4<\/h6>\n<p>$$\\small{\\begin{array}{l|l|l|l} {}&amp;{} &amp;\\textbf{Actual Training Labels}&amp;{}\\\\ \\hline{}&amp; {}&amp; \\text{Class \u201c1.\u201d} &amp; \\text{Class \u201c0.\u201d}\\\\ \\hline\\textbf{Predicted Results} &amp; \\text{Class \u201c1\u201d} &amp; \\text{TP} = 256 &amp; \\text{FP} = 30 \\\\{} &amp; \\text{Class \u201c0\u201d} &amp; \\text{FN} = 6 &amp; \\text{TN} = 106\\\\\u00a0 \\end{array}}$$<\/p>\n<p>When the performance metrics of Table B (using a threshold p-value of 0.4) are compared with those of Table A (using a threshold p-value of 0.60), Table B is <em>more likely<\/em> to have a:<\/p>\n<p>\u00a0 \u00a0 \u00a0A. Lower accuracy and a lower F1 score relative to Table A.<\/p>\n<p>\u00a0 \u00a0 \u00a0B. Higher accuracy and a higher F1 score relative to Table A.<\/p>\n<p>\u00a0 \u00a0 \u00a0C. Lower accuracy and the same F1 score relative to Table A.<\/p>\n<h3>Solution<\/h3>\n<p><strong>The correct answer is A.<\/strong><\/p>\n<p>Performance metrics for confusion matrix A with threshold p-value of 0.6:<\/p>\n<p>$$\\begin{align*}\\text{Precision (P)}&amp;=\\frac{256}{256+19}\\\\&amp;=0.9309\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Recall (R)}&amp;=\\frac{256}{256+6}\\\\&amp;=0.9771\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{F1 Score}&amp;=\\frac{2\\times0.9309\\times0.9771}{0.9309+0.9771}\\\\&amp;=0.9534\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Accuracy}&amp;=\\frac{256+100}{256+19+6+100}\\\\&amp;=0.9344\\end{align*}$$<\/p>\n<p>Performance metrics for confusion matrix B with a threshold p-value of 0.4:<\/p>\n<p>$$\\begin{align*}\\text{Precision (P)}&amp;=\\frac{256}{256+30}\\\\&amp;=0.8951\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Recall (R)}&amp;=\\frac{256}{256+6}\\\\&amp;=0.9771\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{F1 Score}&amp;=\\frac{2\\times0.8951\\times0.9771}{0.8951+0.9771}\\\\&amp;=0.9343\\end{align*}$$<\/p>\n<p>$$\\begin{align*}\\text{Accuracy}&amp;=\\frac{256+106}{256+30+6+106}\\\\&amp;=0.9095\\end{align*}$$<\/p>\n<p>It is clear that Table B has the same number of TPs (256) and TNs (106) as Table A. It follows that Table B also has lower accuracy (0.9095) and a lower F1 score (0.9343) relative to Table A. It is apparent that the ML model using the threshold p-value of 0.6 is the better model in this sentiment classification context with a threshold p-value of 0.4.<\/p>\n<\/blockquote>\n<p>Reading 7: Big Data Projects<\/p>\n<p><em>LOS 7 (g) Evaluate the fit of a machine learning algorithm<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>[vsw id=&#8221;ifHmwpgHWYY&#8221; source=&#8221;youtube&#8221; width=&#8221;611&#8243; height=&#8221;344&#8243; autoplay=&#8221;no&#8221;] Model Training Suppose that the target variable (y) for the ML training model has the sentiment class labels (positive and negative). To ease the calculation of the performance metrics, we relabel them as 1&#8230;<\/p>\n","protected":false},"author":5,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[102,229],"tags":[216,274,230],"class_list":["post-12398","post","type-post","status-publish","format-standard","hentry","category-cfa-level-2","category-quantitative-method","tag-cfa-level-2","tag-evaluating-the-fit-of-a-machine-learning-algorithm","tag-quantitative-method","blog-post","no-post-thumbnail","animate"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Evaluating the Fit of a Machine Learning Algorithm - CFA, FRM, and Actuarial Exams Study Notes<\/title>\n<meta name=\"description\" content=\"Learn how to evaluate the performance of machine learning algorithms for sentiment analysis using techniques like ROC curves.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Evaluating the Fit of a Machine Learning Algorithm - CFA, FRM, and Actuarial Exams Study Notes\" \/>\n<meta property=\"og:description\" content=\"Learn how to evaluate the performance of machine learning algorithms for sentiment analysis using techniques like ROC curves.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/\" \/>\n<meta property=\"og:site_name\" content=\"CFA, FRM, and Actuarial Exams Study Notes\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-09T21:43:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-02T17:15:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg\" \/>\n<meta name=\"author\" content=\"Irene R\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Irene R\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/\"},\"author\":{\"name\":\"Irene R\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/7002f30d8f174958802c1c30b167eaf5\"},\"headline\":\"Evaluating the Fit of a Machine Learning Algorithm\",\"datePublished\":\"2021-03-09T21:43:04+00:00\",\"dateModified\":\"2024-04-02T17:15:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/\"},\"wordCount\":1852,\"image\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_44.jpg\",\"keywords\":[\"CFA-level-2\",\"Evaluating the Fit of a Machine Learning Algorithm\",\"Quantitative Method\"],\"articleSection\":[\"CFA Level II Study Notes\",\"Quantitative Method\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/\",\"name\":\"Evaluating the Fit of a Machine Learning Algorithm - CFA, FRM, and Actuarial Exams Study Notes\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_44.jpg\",\"datePublished\":\"2021-03-09T21:43:04+00:00\",\"dateModified\":\"2024-04-02T17:15:17+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/7002f30d8f174958802c1c30b167eaf5\"},\"description\":\"Learn how to evaluate the performance of machine learning algorithms for sentiment analysis using techniques like ROC curves.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/#primaryimage\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_44.jpg\",\"contentUrl\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/wp-content\\\/uploads\\\/2021\\\/03\\\/Img_44.jpg\",\"width\":1590,\"height\":1002,\"caption\":\"Receiver Operating Characteristics\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/cfa-level-2\\\/quantitative-method\\\/evaluating-fit-machine-learning-algorithm\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Evaluating the Fit of a Machine Learning Algorithm\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#website\",\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/\",\"name\":\"CFA, FRM, and Actuarial Exams Study Notes\",\"description\":\"Question Bank and Study Notes for the CFA, FRM, and Actuarial exams\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/#\\\/schema\\\/person\\\/7002f30d8f174958802c1c30b167eaf5\",\"name\":\"Irene R\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g\",\"caption\":\"Irene R\"},\"url\":\"https:\\\/\\\/analystprep.com\\\/study-notes\\\/author\\\/irene\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Evaluating the Fit of a Machine Learning Algorithm - CFA, FRM, and Actuarial Exams Study Notes","description":"Learn how to evaluate the performance of machine learning algorithms for sentiment analysis using techniques like ROC curves.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/","og_locale":"en_US","og_type":"article","og_title":"Evaluating the Fit of a Machine Learning Algorithm - CFA, FRM, and Actuarial Exams Study Notes","og_description":"Learn how to evaluate the performance of machine learning algorithms for sentiment analysis using techniques like ROC curves.","og_url":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/","og_site_name":"CFA, FRM, and Actuarial Exams Study Notes","article_published_time":"2021-03-09T21:43:04+00:00","article_modified_time":"2024-04-02T17:15:17+00:00","og_image":[{"url":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg","type":"","width":"","height":""}],"author":"Irene R","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Irene R","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/#article","isPartOf":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/"},"author":{"name":"Irene R","@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/7002f30d8f174958802c1c30b167eaf5"},"headline":"Evaluating the Fit of a Machine Learning Algorithm","datePublished":"2021-03-09T21:43:04+00:00","dateModified":"2024-04-02T17:15:17+00:00","mainEntityOfPage":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/"},"wordCount":1852,"image":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/#primaryimage"},"thumbnailUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg","keywords":["CFA-level-2","Evaluating the Fit of a Machine Learning Algorithm","Quantitative Method"],"articleSection":["CFA Level II Study Notes","Quantitative Method"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/","url":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/","name":"Evaluating the Fit of a Machine Learning Algorithm - CFA, FRM, and Actuarial Exams Study Notes","isPartOf":{"@id":"https:\/\/analystprep.com\/study-notes\/#website"},"primaryImageOfPage":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/#primaryimage"},"image":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/#primaryimage"},"thumbnailUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg","datePublished":"2021-03-09T21:43:04+00:00","dateModified":"2024-04-02T17:15:17+00:00","author":{"@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/7002f30d8f174958802c1c30b167eaf5"},"description":"Learn how to evaluate the performance of machine learning algorithms for sentiment analysis using techniques like ROC curves.","breadcrumb":{"@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/#primaryimage","url":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg","contentUrl":"https:\/\/analystprep.com\/study-notes\/wp-content\/uploads\/2021\/03\/Img_44.jpg","width":1590,"height":1002,"caption":"Receiver Operating Characteristics"},{"@type":"BreadcrumbList","@id":"https:\/\/analystprep.com\/study-notes\/cfa-level-2\/quantitative-method\/evaluating-fit-machine-learning-algorithm\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/analystprep.com\/study-notes\/"},{"@type":"ListItem","position":2,"name":"Evaluating the Fit of a Machine Learning Algorithm"}]},{"@type":"WebSite","@id":"https:\/\/analystprep.com\/study-notes\/#website","url":"https:\/\/analystprep.com\/study-notes\/","name":"CFA, FRM, and Actuarial Exams Study Notes","description":"Question Bank and Study Notes for the CFA, FRM, and Actuarial exams","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/analystprep.com\/study-notes\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/analystprep.com\/study-notes\/#\/schema\/person\/7002f30d8f174958802c1c30b167eaf5","name":"Irene R","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/33caf1e1bcb63ee970b36351f165c7bc714b19614993ab9c2c8bf36273b7df48?s=96&d=mm&r=g","caption":"Irene R"},"url":"https:\/\/analystprep.com\/study-notes\/author\/irene\/"}]}},"_links":{"self":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/12398","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/comments?post=12398"}],"version-history":[{"count":54,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/12398\/revisions"}],"predecessor-version":[{"id":29319,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/posts\/12398\/revisions\/29319"}],"wp:attachment":[{"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/media?parent=12398"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/categories?post=12398"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/analystprep.com\/study-notes\/wp-json\/wp\/v2\/tags?post=12398"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}