{"id":570,"date":"2019-12-13T11:21:59","date_gmt":"2019-12-13T11:21:59","guid":{"rendered":"http:\/\/algotrading101.com\/learn\/?p=570"},"modified":"2020-07-07T21:32:13","modified_gmt":"2020-07-07T21:32:13","slug":"sentiment-analysis-python-guide","status":"publish","type":"post","link":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/","title":{"rendered":"Sentiment Analysis with Python &#8211; A Beginner&#8217;s Guide"},"content":{"rendered":"<div class=\"pvc_clear\"><\/div><p id=\"pvc_stats_570\" class=\"pvc_stats total_only  \" data-element-id=\"570\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p><div class=\"pvc_clear\"><\/div>\n<p>Sentiment analysis in finance has become commonplace. In many cases, it has become ineffective as many market players understand it and have one-upped this technique. <\/p>\n\n\n\n<p>That said, just like machine learning or basic statistical analysis, sentiment analysis is just a tool. It is how we use it that determines its effectiveness.<\/p>\n\n\n\n<p>Here are the general steps to learn sentiment analysis for finance:<\/p>\n\n\n\n<ul><li><strong>Understand what Sentiment Analysis is<\/strong><\/li><li><strong>Understand how it can be effectively used in finance<\/strong><\/li><li><strong>Learn data collection and text processing<\/strong><\/li><li><strong>Learn to run sentiment analysis<\/strong><\/li><li><strong>Learn how to use the analysis output for finance<\/strong><\/li><\/ul>\n\n\n\n<p><strong>Table of contents<\/strong><\/p>\n\n\n\n<ol><li><a rel=\"noreferrer noopener\" aria-label=\"What is Sentiment Analysis? (opens in a new tab)\" href=\"#what-is-sentiment-analysis\" target=\"_blank\">What is Sentiment Analysis?<\/a><\/li><li><a href=\"http:\/\/why-do-we-need-sentiment-analysis\">Why do we need Sentiment Analysis?<\/a><\/li><li><a href=\"#sentiment-analysis-for-finance\">What is sentiment analysis for finance?<\/a><\/li><li><a href=\"#sentiment-analysis-for-trading\">How is sentiment analysis used for trading?<\/a><\/li><li><a href=\"#predict-stock-prices-sentiment-analysis\">How to predict stock prices with news and article headlines?<\/a><\/li><li><a href=\"#predict-tesla-stock-prices-sentiment-analysis\"><strong>Mega Project: Predicting Tesla stock prices with Seeking Alpha&#8217;s article headlines with Python<\/strong><\/a><ul><li><a href=\"#collate-article-headlines\">Collate article headlines<\/a><\/li><li><a href=\"#text-processing\">Import and clean the data (text processing)<\/a><\/li><li><a href=\"#sentiment-analysis-create-score-index\">Run sentiment analysis and create a score index<\/a><\/li><li><a href=\"#correlation-score-index-against-prices\">Correlate lagged score index against prices<\/a><\/li><\/ul><\/li><li><a href=\"#sentiment-analysis-real-world-trading\">Trading in the Real World &#8211; Improving our Analysis<\/a><\/li><li><a href=\"#ending-note-understand-the-trade\">Ending Note &#8211; Truly understand the trade<\/a><\/li><\/ol>\n\n\n\n<p>Let&#8217;s first understand why we need sentiment analysis for finance, or more specifically, trading.<\/p>\n\n\n\n<p>Next, we will demonstrate a project that uses Python to extract and analyse article headlines to predict Tesla&#8217;s stock prices.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"931\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-3-1024x931.png\" alt=\"\" class=\"wp-image-791\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-3-1024x931.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-3-300x273.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-3-768x698.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-3.png 1575w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Let&#8217;s go!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is Sentiment Analysis?<\/h3>\n\n\n\n<a name=\"what-is-sentiment-analysis\"><\/a>\n\n\n\n<p>Official Definition (from Wikipedia): <\/p>\n\n\n\n<p><em>Sentiment analysis (also known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.<\/em><\/p>\n\n\n\n<p>In simple English:<\/p>\n\n\n\n<p><em>We use computers to extract meanings behind texts, images and other data.<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why do we need Sentiment Analysis?<\/h3>\n\n\n\n<a name=\"why-do-we-need-sentiment-analysis\"><\/a>\n\n\n\n<p>Why can&#8217;t humans just read the texts? Why do we need a machine to do it for us?<\/p>\n\n\n\n<p>Reasons for using sentiment analysis:<\/p>\n\n\n\n<ul><li>Machines can read much faster (maybe a million times faster) than humans<\/li><li>Machines can read in many languages<\/li><li>Machines can derive meaning from text in a standardised manner (humans are subjective)<\/li><li>Machines can store insights from texts in a convenient way for further processing<\/li><\/ul>\n\n\n\n<p>There are of course downsides to sentiment analysis. Machines are not able to accurately derive meaning from texts (but they are getting better). Slangs, typos, contextual meaning, sarcasm still poses difficulties.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What is sentiment analysis for finance?<\/h3>\n\n\n\n<a name=\"sentiment-analysis-for-finance\"><\/a>\n\n\n\n<p>Financial sentiment analysis is used to extract insights from news, social media, financial reports and alternative data for investment, trading, risk management, operations in financial institutions, and basically anything finance related.<\/p>\n\n\n\n<p>We will focus on trading and investments in this article.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How is sentiment analysis used for trading?<\/h3>\n\n\n\n<a name=\"sentiment-analysis-for-trading\"><\/a>\n\n\n\n<p>Here are a few ways:<\/p>\n\n\n\n<ul><li>Read a news article or tweet fast and fire a trade instantly<\/li><li>Read large amount of financial reports and output insights<\/li><li>Gather insights from the crowds by analysing social media, web forums, news and analysts&#8217; reports<\/li><\/ul>\n\n\n\n<p><strong>Read a news article or tweet fast and fire a trade instantly<\/strong><\/p>\n\n\n\n<p>Let&#8217;s say that Country A&#8217;s leader decided to make a trade deal with another out of the blue. <\/p>\n\n\n\n<p>In the best case scenario, a human might take 2 seconds to read that piece of news (if he or his team is awake) and another 3 seconds to fire an appropriate trade (if he is fast and is already on his trading desk). <\/p>\n\n\n\n<p>A machine would take less than 0.1 seconds to read the new and fire the trade. Plus, the machine doesn&#8217;t sleep and can monitor the news from not only Country A, but all countries around the way.<\/p>\n\n\n\n<p><strong>Read large amount of financial reports and output insights<\/strong><\/p>\n\n\n\n<p>A machine can read 1000 annual 10-K financial reports (in any language) in the time you take to read the first 10 pages of one report.<\/p>\n\n\n\n<p>That said, machines aren&#8217;t that great in deriving insights from such large unstructured text data. <\/p>\n\n\n\n<p>There is a large variance in output. The machine might get it right on average when you combine insights from 1000 stocks, but for an individual stock, it will get it wrong most of the time.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" width=\"1024\" height=\"762\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/Randomness-chart-1-1024x762.png\" alt=\"\" class=\"wp-image-619\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/Randomness-chart-1-1024x762.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/Randomness-chart-1-300x223.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/Randomness-chart-1-768x571.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/Randomness-chart-1.png 1839w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Each insight can have a big variance (i.e. they might miss the mark some of the time)<\/figcaption><\/figure>\n\n\n\n<p><em>Thus, the value here might not be to derive insights for one stock. It is to derive insights from thousands of stocks, traded in the same portfolio in a statistical manner.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"696\" height=\"545\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/brownian-motion-normal-distribution-2.png\" alt=\"\" class=\"wp-image-622\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/brownian-motion-normal-distribution-2.png 696w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/brownian-motion-normal-distribution-2-300x235.png 300w\" sizes=\"(max-width: 696px) 100vw, 696px\" \/><figcaption>We will get an average prediction when combining many stock insights.<\/figcaption><\/figure>\n\n\n\n<p>The variance in each stock insight will balance out when we combine it with thousands of other stocks. Hence, we will get an average prediction for our portfolio of hundreds or thousands of stocks.<\/p>\n\n\n\n<p> This is similar to the idea in <a href=\"https:\/\/en.wikipedia.org\/wiki\/Central_limit_theorem\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"central limit theorem (opens in a new tab)\">central limit theorem<\/a>.<\/p>\n\n\n\n<p>Once we get our average prediction and standard deviation figures, we can then input that into a sizing algorithm to determine how much we should trade for each stock and how to allocate capital for the portfolio to maximise long term reward-to-risk ratio. But this is a story for another day.<\/p>\n\n\n\n<p>But, do note that if your sentiment analysis of the financial reports is so bad that the mean of your insights is inaccurate, then you will not be profitable anyways.<\/p>\n\n\n\n<p><strong>Gather insights from the crowds by analysing social media, web forums, news and analysts&#8217; reports<\/strong><\/p>\n\n\n\n<p>This is touchy. Sentiment analysis of social media posts were hyped up a few years ago. The effectiveness of these analysis remains debatable.<\/p>\n\n\n\n<p>That said, we can increase the effectiveness of these insights by complementing them with other analysis, or to sandbox them by hedging away the variables we can&#8217;t control.<\/p>\n\n\n\n<p class=\"has-small-font-size\">To read more on sandboxing: <a href=\"https:\/\/algotrading101.com\/learn\/how-to-use-hedging-as-a-trading-strategy\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"How to use Hedging as a Trading Strategy (opens in a new tab)\">How to use Hedging as a Trading Strategy<\/a><\/p>\n\n\n\n<p>For instance, if we are thinking of investing in Slack but are worried that Microsoft Teams will make Slack obsolete.<\/p>\n\n\n\n<p>We can go to tech forums and check the amount and sentiment of the comments there about Slack vs that of Microsoft Teams.<\/p>\n\n\n\n<p>When we do a pairing using the same information source, the results are generally more accurate as most unwanted variables will be hedged away.<\/p>\n\n\n\n<p>On the other hand, if we have just taken forum comments on Slack and try to assign a score of how positive or negative it is, the results will be subjective. <\/p>\n\n\n\n<p>It will contain variables like, the accuracy of the sentiment analysis library, the methodology in text processing, noise and low quality data etc.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"877\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/google-trends_slack-vs-teams-1024x877.png\" alt=\"\" class=\"wp-image-587\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/google-trends_slack-vs-teams-1024x877.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/google-trends_slack-vs-teams-300x257.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/google-trends_slack-vs-teams-768x658.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/google-trends_slack-vs-teams.png 1813w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Teams is catching up to Slack! &#8211; Credits: Google Trends<\/figcaption><\/figure>\n\n\n\n<p>The lazy way is to check the search traffic for Slack vs Teams on Google Trends.<\/p>\n\n\n\n<p>Now that we&#8217;ve covered the theory, let&#8217;s get our hands dirty!<\/p>\n\n\n\n<p style=\"padding:15px 15px 15px 15px;color: #555555;background-color: #E1FFC1;border: #dddddd 2px solid\">\u00bb Hello! This article doesn&#8217;t cover live trading, check out this guide if you want to learn how to run a live algorithmic trading: <a href=\"https:\/\/algotrading101.com\/learn\/alpaca-trading-api-guide\/\" target=\"_blank\" rel=\"noopener noreferrer\">Alpaca Trading API Guide \u2013 A Step-by-step Guide<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">How to predict stock prices with news and article headlines?<\/h3>\n\n\n\n<a name=\"predict-stock-prices-sentiment-analysis\"><\/a>\n\n\n\n<p>Here are the steps to run our sentiment analysis project: <\/p>\n\n\n\n<ol><li><strong>Collate article headlines and dates<\/strong><\/li><li><strong>Import and clean the data (text processing)<\/strong><\/li><li><strong>Run sentiment analysis and create a score index<\/strong><\/li><li><strong>Correlate lagged score index against prices<\/strong><\/li><\/ol>\n\n\n\n<p>This is the basic overview. Of course, the effectiveness of our analysis lies in the subtle details of the process.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Mega Project: Predicting Tesla stock prices with Seeking Alpha&#8217;s article headlines with Python<\/h3>\n\n\n\n<a name=\"predict-tesla-stock-prices-sentiment-analysis\"><\/a>\n\n\n\n<p>We will be checking if Seeking Alpha&#8217;s headlines have any predictive power for Tesla&#8217;s stock price movements.<\/p>\n\n\n\n<p>This will be done using the above 4-Step process with Python. <\/p>\n\n\n\n<p>We will conduct a very basic level of analysis to keep things simple.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">1. Collate article headlines<\/h4>\n\n\n\n<a name=\"collate-article-headlines\"><\/a>\n\n\n\n<p>Let&#8217;s download a web scrapping package called BeautifulSo&#8230; Just kidding!<\/p>\n\n\n\n<p>This is not a web scrapping article and I don&#8217;t want to bloat it. We will scrape the headline by hand!<\/p>\n\n\n\n<p>Here are the steps for collating headlines:<\/p>\n\n\n\n<ol><li>Go to SeekingAlpha.com, search for TSLA and scroll for more headlines<\/li><li>Copy and paste the page onto Excel<\/li><li>Remove unwanted data<\/li><\/ol>\n\n\n\n<p><strong>Step 1: Go to SeekingAlpha.com, search for TSLA and scroll for more headlines<\/strong><\/p>\n\n\n\n<p>Go to <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/seekingalpha.com\/symbol\/TSLA?s=tsla\" target=\"_blank\">SeekingAlpha.com<\/a> and search for TSLA (Tesla&#8217;s ticker symbol in the search bar at the top of the page.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"886\" height=\"1024\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-2-886x1024.png\" alt=\"\" class=\"wp-image-593\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-2-886x1024.png 886w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-2-260x300.png 260w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-2-768x887.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-2.png 1556w\" sizes=\"(max-width: 886px) 100vw, 886px\" \/><\/figure>\n\n\n\n<p>You will see a page like this. What we want is the headline under the Analysis section.<\/p>\n\n\n\n<p>Before we copy that, keep scrolling down to load more headlines.<\/p>\n\n\n\n<p>In my analysis, I scrolled down till the early 2018 articles appeared.<\/p>\n\n\n\n<p><strong>Step 2: Copy and paste the page onto Excel<\/strong><\/p>\n\n\n\n<p>Next, ctrl-A the page. Yes, you read that right. We are going old school.<\/p>\n\n\n\n<p>Open your Excel, then ctrl-C. You should see something like this<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-pasted-906x1024.png\" alt=\"\" class=\"wp-image-594\" width=\"453\" height=\"512\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-pasted-906x1024.png 906w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-pasted-265x300.png 265w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-pasted-768x868.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-pasted.png 1347w\" sizes=\"(max-width: 453px) 100vw, 453px\" \/><\/figure><\/div>\n\n\n\n<p><strong>Step 3: Remove unwanted data<\/strong><\/p>\n\n\n\n<p>Delete all the unwanted rows. We want to keep the &#8220;Analysis&#8221; headlines (not the &#8220;News&#8221; headlines) and the corresponding dates.<\/p>\n\n\n\n<p>Delete all rows above the first headline. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-top-1024x1014.png\" alt=\"\" class=\"wp-image-600\" width=\"512\" height=\"507\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-top-1024x1014.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-top-150x150.png 150w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-top-300x297.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-top-768x760.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-top.png 1123w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><figcaption>Your first row should be the first &#8220;Analysis&#8221; headline. Ignore the thumbnail pictures, they will be gone later when we save the file as a CSV.<\/figcaption><\/figure><\/div>\n\n\n\n<p>Delete all rows below the date of the last headline. In other words, delete all rows starting with the text &#8220;News&#8221; in bold. <\/p>\n\n\n\n<p>You can search for &#8220;News&#8221; and check &#8220;Match entire cell contents&#8221; to find that row. <\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-bottom-2-1024x904.png\" alt=\"\" class=\"wp-image-605\" width=\"512\" height=\"452\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-bottom-2-1024x904.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-bottom-2-300x265.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-bottom-2-768x678.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/excel-data-delete-bottom-2.png 1321w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n\n<p>Now, save that file as a CSV. This will remove all the thumbnail graphics.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">2. <strong>Import and clean the data<\/strong> (text processing)<\/h4>\n\n\n\n<a name=\"text-processing\"><\/a>\n\n\n\n<p>We will use Python and Jupyter Notebook for this. Python is a programming language and Jupyter Notebook is the &#8220;software&#8221; that we code in. The technical term is IDE (<a rel=\"noreferrer noopener\" aria-label=\"Integrated development environment (opens in a new tab)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Integrated_development_environment\" target=\"_blank\">Integrated development environment<\/a>).<\/p>\n\n\n\n<p>You can use other IDEs, but I suggest using Jupyter Notebook if you are new to this. <\/p>\n\n\n\n<p class=\"has-small-font-size\">For those who are new, you can check out these guides on how to install Python and Jupyter Notebook on your computer using Anaconda: <a rel=\"noreferrer noopener\" href=\"https:\/\/hackernoon.com\/installing-python-and-anaconda-on-windows-f9059ba8b136\" target=\"_blank\">Hackernoon Guide<\/a>, <a rel=\"noreferrer noopener\" href=\"https:\/\/docs.anaconda.com\/anaconda\/install\/\" target=\"_blank\">Anaconda Docs Guide<\/a>  <\/p>\n\n\n\n<p>Here are the steps for this section:<\/p>\n\n\n\n<ol><li>Import your CSV to your Jupyter Notebook<\/li><li>Clean our data<\/li><\/ol>\n\n\n\n<p><strong>Step 1: Import your CSV to your Jupyter Notebook<\/strong><\/p>\n\n\n\n<p>We&#8217;ll use the pd.read_csv() method in Pandas to pull our CSV in.<\/p>\n\n\n\n<p class=\"has-small-font-size\">Pandas is a Python library for the purpose of data science. You can install it by following:  <a href=\"https:\/\/pypi.org\/project\/pandas\/\">https:\/\/pypi.org\/project\/pandas\/<\/a> or  <a href=\"https:\/\/anaconda.org\/anaconda\/pandas\">https:\/\/anaconda.org\/anaconda\/pandas<\/a> <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd # import the Pandas Library\n\ndf1 = pd.read_csv(\"tesla-headlines-sa.csv\", encoding='windows-1250', header=None)\ndf1.columns = &#91;'Title', 'Date']\ndf1<\/code><\/pre>\n\n\n\n<p>If you are too lazy to copy and paste headlines from the SeekingAlpha website, you can use our dataset. <\/p>\n\n\n\n<p>Download the our entire code + data folder from our Github repository: <a rel=\"noreferrer noopener\" aria-label=\"Sentiment-Analysis-1-TSLA-Headlines (opens in a new tab)\" href=\"https:\/\/github.com\/Lucas170\/Sentiment-Analysis-1-TSLA-Headlines\" target=\"_blank\">Sentiment-Analysis-1-TSLA-Headlines<\/a>. The CSV file is called &#8220;tsla-headlines-sa.csv&#8221;.<\/p>\n\n\n\n<p>Make sure that your CSV file is in the same folder as where your code is saved if you are running my code.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"419\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/code-download-csv-1024x419.png\" alt=\"\" class=\"wp-image-635\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/code-download-csv-1024x419.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/code-download-csv-300x123.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/code-download-csv-768x314.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Importing our CSV<\/figcaption><\/figure>\n\n\n\n<p>Running pd.read_csv() will give us a <a rel=\"noreferrer noopener\" aria-label=\"dataframe (opens in a new tab)\" href=\"https:\/\/www.geeksforgeeks.org\/python-pandas-dataframe\/\" target=\"_blank\">dataframe<\/a> with 2 columns. We&#8217;ve titled them &#8220;Title&#8221; and &#8220;Date&#8221;. We&#8217;ve added an <a rel=\"noreferrer noopener\" aria-label=\"encoding (opens in a new tab)\" href=\"http:\/\/pandaproject.net\/docs\/determining-the-encoding-of-a-csv-file.html\" target=\"_blank\">encoding<\/a> input to fix the character formatting issue.<\/p>\n\n\n\n<p><strong>Step 2: Clean our data<\/strong><\/p>\n\n\n\n<p>The next step is to clean our data. <\/p>\n\n\n\n<p>Our &#8220;Title&#8221; data is already clean enough to be used for our sentiment analysis library, so we shall leave it as it is.<\/p>\n\n\n\n<p>Our &#8220;Date&#8221; Data needs work though. Here are the steps to clean the date data<\/p>\n\n\n\n<ol><li>Determine our end goal<\/li><li>Clean the dates<\/li><li>Convert cleaned date to datetime format<\/li><li>Clean and convert the entire dataframe<\/li><\/ol>\n\n\n\n<p><span style=\"text-decoration: underline;\">1. Determine our end goal<\/span><\/p>\n\n\n\n<p>Our Date data is in text (i.e. string) format. We want to change it to a <a href=\"https:\/\/www.w3schools.com\/python\/python_datetime.asp\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"datetime (opens in a new tab)\">datetime<\/a> format so that it is easier to run our analysis along with our stock price data later.<\/p>\n\n\n\n<p><span style=\"text-decoration: underline;\">2. Clean the date<\/span><\/p>\n\n\n\n<p>Before we can modify the date using code, we need to briefly look through the dataset to have a sense of the format of the data. <\/p>\n\n\n\n<p>I&#8217;ve briefly scanned through the data, and spotted 4 variations.<\/p>\n\n\n\n<p><em>Variation 1:<\/em><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1.png\" alt=\"\" class=\"wp-image-640\" width=\"406\" height=\"26\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1.png 812w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1-300x19.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1-768x49.png 768w\" sizes=\"(max-width: 406px) 100vw, 406px\" \/><figcaption>Variation 1<\/figcaption><\/figure><\/div>\n\n\n\n<p>Variation 1 doesn&#8217;t contain a day or date. It says &#8220;Yesterday&#8221;. Only the first row has this format. <\/p>\n\n\n\n<p>Thus, I change this date via hard coding since it is inefficient to create a systematic code when it will only be used once.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df1&#91;'Date']&#91;0] = 'Dec. 9'\ndf1.head() # display the first 5 rows<\/code><\/pre>\n\n\n\n<p>I change the format to a text similar to the other rows. Hence, when I modify the other rows using code, the first row will be modified too.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"264\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1-2-1024x264.png\" alt=\"\" class=\"wp-image-643\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1-2-1024x264.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1-2-300x77.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1-2-768x198.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-1-2.png 1589w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption> The first row is changed.<\/figcaption><\/figure>\n\n\n\n<p><em>Variation 2<\/em><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2.png\" alt=\"\" class=\"wp-image-650\" width=\"357\" height=\"101\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2.png 714w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2-300x84.png 300w\" sizes=\"(max-width: 357px) 100vw, 357px\" \/><figcaption>Variation 2<\/figcaption><\/figure><\/div>\n\n\n\n<p>Variation 2 consists of the day, date but it doesn&#8217;t have a year. <\/p>\n\n\n\n<p>SeekingAlpha doesn&#8217;t include the year if the article is published in the same year as the current year.<\/p>\n\n\n\n<p>We are not interested in the day. We just want the date and year.<\/p>\n\n\n\n<p>Here, we need to extract the date and add in the current year.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import re # import Regular expression library\n\nprint(df1&#91;'Date']&#91;1]) # display original\n\nmatch = re.search(r'\\w{3}\\.\\s\\d{1,2}', df1&#91;'Date']&#91;1])\n\nmodifiedDate = match&#91;0] + \", 2019\"\n\nprint(modifiedDate) # display modified<\/code><\/pre>\n\n\n\n<p>To do this, we first import the Regular Expressions library (AKA re AKA Regex library) to help us with string manipulation.<\/p>\n\n\n\n<p>We look for dates with the format &#8220;\\w{3}.\\s\\d{1,2}&#8221;. <\/p>\n\n\n\n<ul><li>\\w{3} looks for 3 letters<\/li><li>\\. looks for a period symbol<\/li><li>\\s looks for a space<\/li><li>\\d{1,2} looks for 1 or 2 digits<\/li><\/ul>\n\n\n\n<p>This format fits our variation 2 data, which looks like &#8220;Dec. 6&#8221;. All other texts are ignored.<\/p>\n\n\n\n<p>Here is a <a href=\"https:\/\/www.debuggex.com\/cheatsheet\/regex\/python\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"character cheat sheet (opens in a new tab)\">character cheat sheet<\/a> for reference.<\/p>\n\n\n\n<p>After we found our date, we add the year to it. <\/p>\n\n\n\n<p>If you are wondering, &#8220;I&#8217;m new to Python, how do I know what code to type?&#8221;<\/p>\n\n\n\n<p>The answer is&#8230; you Google it. <\/p>\n\n\n\n<p>The beauty about coding is that you are building on top of other people&#8217;s knowledge and work.<\/p>\n\n\n\n<p>You might want to learn some bare minimum basics. Then whatever problem you want to solve, Google it, copy other people&#8217;s code, modify it, make mistakes, learn and repeat.<\/p>\n\n\n\n<p>After a while, you will be faster at this and can solve problems more effectively (still with the help of Google).<\/p>\n\n\n\n<p>A good programmer is not someone who can spin up effective code out of thin air (though those people do exist). A good programmer knows what he doesn&#8217;t know, what his tools can achieve (even though he might not know how to do it) and how to find answers.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2-2-1-1024x540.png\" alt=\"\" class=\"wp-image-654\" width=\"512\" height=\"270\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2-2-1-1024x540.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2-2-1-300x158.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2-2-1-768x405.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-2-2-1.png 1194w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n\n<p>We add &#8220;, 2019&#8221; instead of &#8220;2019&#8221; to match variation 3.<\/p>\n\n\n\n<p><em>Variation 3<\/em><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3.png\" alt=\"\" class=\"wp-image-657\" width=\"345\" height=\"100\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3.png 690w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3-300x87.png 300w\" sizes=\"(max-width: 345px) 100vw, 345px\" \/><figcaption>Variation 3<\/figcaption><\/figure><\/div>\n\n\n\n<p>Variation 3 is simply variation 2 plus the year.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import re # import Regular expression library\n\nprint(df1&#91;'Date']&#91;1000]) # display original\n\nmatch = re.search(r'\\w{3}\\.\\s\\d{1,2}\\,\\s\\d{4}', df1&#91;'Date']&#91;1000])\n\nprint(match&#91;0]) # display modified<\/code><\/pre>\n\n\n\n<p>The code is similar to variation 2. We added &#8220;\\d{4}&#8221; in the re.search to grab the year.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3-2-1024x336.png\" alt=\"\" class=\"wp-image-661\" width=\"768\" height=\"252\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3-2-1024x336.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3-2-300x98.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3-2-768x252.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-3-2.png 1481w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/figure><\/div>\n\n\n\n<p><em>Variation 4<\/em><\/p>\n\n\n\n<p>Variation 4 is specific to the month of May.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4.png\" alt=\"\" class=\"wp-image-665\" width=\"355\" height=\"107\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4.png 710w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4-300x90.png 300w\" sizes=\"(max-width: 355px) 100vw, 355px\" \/><figcaption>Variation 4<\/figcaption><\/figure><\/div>\n\n\n\n<p>All months except May have a period symbol after it. &#8220;Jan.&#8221;, &#8220;Feb.&#8221; etc. The period exists to indicate the spelling of the month is truncated.<\/p>\n\n\n\n<p>The month of May doesn&#8217;t need this. <\/p>\n\n\n\n<p>Thus, in our Regex code, we do not need to include a period symbol.<\/p>\n\n\n\n<p>Note that to see all the data in your dataframe, you can use the following code:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>pd.set_option('display.max_rows', df1.shape&#91;0]+1)\ndf1<\/code><\/pre>\n\n\n\n<p>We have 2 code for variation 4. One for the dates with year, one for dates without.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import re # import Regular expression library\n\nprint(df1&#91;'Date']&#91;200]) # display original (without year)\nmatch = re.search(r'\\w{3}\\s\\d{1,2}', df1&#91;'Date']&#91;200])\nmodifiedDate = match&#91;0] + \", 2019\"\nprint(modifiedDate) # display modified\n\nprint(df1&#91;'Date']&#91;850]) # display original (with year)\nmatch = re.search(r'\\w{3}\\s\\d{1,2}\\,\\s\\d{4}', df1&#91;'Date']&#91;850])\nprint(match&#91;0]) # display modified<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4-2-1-1024x527.png\" alt=\"\" class=\"wp-image-669\" width=\"768\" height=\"395\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4-2-1-1024x527.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4-2-1-300x154.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4-2-1-768x395.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/date-4-2-1.png 1404w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/figure><\/div>\n\n\n\n<p><span style=\"text-decoration: underline;\">3. Convert cleaned date to datetime format<\/span><\/p>\n\n\n\n<p>Now that we have all the dates in either &#8220;MMM. DD, YYYY&#8221; or  &#8220;May DD, YYYY&#8221; format, it is time to convert these to datetime format.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from datetime import datetime\n\nnewDate1 = datetime.strptime('Dec. 6, 2019', '%b. %d, %Y').date()\nprint(newDate1)\n\nnewDate2 = datetime.strptime('May 17, 2018', '%b %d, %Y').date()\nprint(newDate2)<\/code><\/pre>\n\n\n\n<p>Import the datetime library. This library helps us with datetime formatting.<\/p>\n\n\n\n<p>Use the datetime.strptime() method to convert date to time. The first input is our date, the second input is the format of our date.<\/p>\n\n\n\n<p>The symbols &#8221; %b. %d, %Y&#8221; represent the date formats.<\/p>\n\n\n\n<ul><li>%b looks for the months&#8217; 3 character shortname<\/li><li>. looks for a period symbol<\/li><li>%d looks for the day of the month as a number<\/li><li>%Y looks for the year as a 4 digit number<\/li><\/ul>\n\n\n\n<p>You can learn more about <a rel=\"noreferrer noopener\" aria-label=\"datetime.strptime() (opens in a new tab)\" href=\"https:\/\/www.journaldev.com\/23365\/python-string-to-datetime-strptime\" target=\"_blank\">datetime.strptime()<\/a> here.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"359\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-1-1-1024x359.png\" alt=\"\" class=\"wp-image-677\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-1-1-1024x359.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-1-1-300x105.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-1-1-768x270.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-1-1.png 1436w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><span style=\"text-decoration: underline;\">4. Clean and convert the entire dataframe<\/span><\/p>\n\n\n\n<p>Now that we&#8217;ve covered how to clean the 4 variations and convert the date to the datetime format, lets&#8217; run a loop to clean the entire &#8220;Date&#8221; column.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from datetime import datetime\nimport re\n\nnewDateList = &#91;] # create a list to store the cleaned dates\n\nfor dateOfArticles in df1&#91;'Date']: # loop every row in the \"Date\" column\n    match = re.search(r'\\w{3}\\.\\s\\d{1,2}\\,\\s\\d{4}|May\\s\\d{1,2}\\,\\s\\d{4}|\\w{3}\\.\\s\\d{1,2}|May\\s\\d{1,2}', \n                      dateOfArticles)\n\n    if re.search(r'\\w{3}\\.\\s\\d{1,2}\\,\\s\\d{4}|\\w{3}\\s\\d{1,2}\\,\\s\\d{4}',match&#91;0]):\n        fulldate = match&#91;0] # don't append year to string\n    else:\n        fulldate = match&#91;0] + \", 2019\" # append year to string\n    \n    for fmt in ('%b. %d, %Y', '%b %d, %Y'):\n        try:\n            newDate = datetime.strptime(fulldate, fmt).date()\n            break # if format is correct, don't test any other formats\n        except ValueError:\n            pass\n        \n    newDateList.append(newDate) # add new date to the list\n\nif(len(newDateList) != df1.shape&#91;0]):\n    print(\"Error: Rows don't match\")\nelse:\n    df1&#91;'New Date'] = newDateList # add the list to our original dataframe\n\ndf1<\/code><\/pre>\n\n\n\n<p>Wow that&#8217;s a handful of code. Don&#8217;t worry we will break it down:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for dateOfArticles in df1&#91;'Date']: # loop every row in the \"Date\" column\n    match = re.search(r'\\w{3}\\.\\s\\d{1,2}\\,\\s\\d{4}|May\\s\\d{1,2}\\,\\s\\d{4}|\\w{3}\\.\\s\\d{1,2}|May\\s\\d{1,2}', \n                      dateOfArticles)\n<\/code><\/pre>\n\n\n\n<p>Here we loop through every row and look for any of the 4 date string variations.<\/p>\n\n\n\n<p>Note that the &#8220;|&#8221; symbol represents &#8220;or&#8221;. It allows us to look for one variation or another.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>if re.search(r'\\w{3}\\.\\s\\d{1,2}\\,\\s\\d{4}|\\w{3}\\s\\d{1,2}\\,\\s\\d{4}',match&#91;0]):\n        fulldate = match&#91;0] # don't append year to string\n    else:\n        fulldate = match&#91;0] + \", 2019\" # append year to string<\/code><\/pre>\n\n\n\n<p>Once we found the variation, we check if it contains the year. <\/p>\n\n\n\n<p>If yes, don&#8217;t add a year to the string. If no, add the appropriate year to the end of the string. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>for fmt in ('%b. %d, %Y', '%b %d, %Y'):\n        try:\n            newDate = datetime.strptime(fulldate, fmt).date()\n            break # if format is correct, don't test any other formats\n        except ValueError:\n            pass\n        \nnewDateList.append(newDate) # add new date to the list<\/code><\/pre>\n\n\n\n<p>Next, we convert the &#8220;Date&#8221; data from string to datetime format. <\/p>\n\n\n\n<p>Our dates have 2 possible formats now, one with a period symbol and one without. We will check for both.<\/p>\n\n\n\n<p>Once done, add the new date data to a list.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>if(len(newDateList) != df1.shape&#91;0]):\n    print(\"Error: Rows don't match\")\nelse:\n    df1&#91;'New Date'] = newDateList # add the list to our original dataframe<\/code><\/pre>\n\n\n\n<p>Check if the number of rows of the list match with original dataframe.<\/p>\n\n\n\n<p>If yes, add the list as new column to our original dataframe.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"914\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-2-2-1024x914.png\" alt=\"\" class=\"wp-image-685\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-2-2-1024x914.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-2-2-300x268.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-2-2-768x686.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/datetime-2-2.png 1632w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>We have finally gotten our &#8220;Date&#8221; data fixed!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">3. <strong>Run sentiment analysis and create a score index<\/strong><\/h4>\n\n\n\n<a name=\"sentiment-analysis-create-score-index\"><\/a>\n\n\n\n<p>The next part is to send our headlines into a sentiment analyser to churn out a score.<\/p>\n\n\n\n<p>We can build our own sentiment analyser model. A simple one can be something that is trained using supervised machine learning. <\/p>\n\n\n\n<p>The training data can be historical financial headlines. The individual words, phrases, or entire headlines in this data set will be labelled with a sentiment score. E.g. 1 could be extremely positive, 0 is neutral and -1 is extremely negative. This is known as lexicon-based sentiment analysis.<\/p>\n\n\n\n<p>You can think of a lexicon as a list of words, punctuation, phases, emojis etc.<\/p>\n\n\n\n<p>We can then use this trained model to evaluate the sentient score for future headlines. <\/p>\n\n\n\n<p>All of this model building stuff sounds fun but&#8230; we won&#8217;t be doing that in this article. I will write another article dedicated to sentiment analysis model building. <\/p>\n\n\n\n<p>In this article, we will use pre-trained models that are built by others.<\/p>\n\n\n\n<p>Introducing&#8230; Vader<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"960\" height=\"538\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader.jpeg\" alt=\"\" class=\"wp-image-692\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader.jpeg 960w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-300x168.jpeg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-768x430.jpeg 768w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><figcaption>No, not this Vader!<\/figcaption><\/figure><\/div>\n\n\n\n<p><strong>What is VADER Sentiment Analyzer<\/strong><\/p>\n\n\n\n<p>VADER is a sentiment analyser that is trained using social media and news data using a lexicon-based approach. This means that it looks at words, punctuation, phases, emojis etc and rates them as positive or negative.<\/p>\n\n\n\n<p>VADER stands for &#8220;Valence Aware Dictionary and sEntiment Reasoner&#8221;. You can learn more about it <a rel=\"noreferrer noopener\" aria-label=\"here (opens in a new tab)\" href=\"https:\/\/medium.com\/analytics-vidhya\/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f\" target=\"_blank\">here<\/a> and <a rel=\"noreferrer noopener\" aria-label=\"here (opens in a new tab)\" href=\"http:\/\/comp.social.gatech.edu\/papers\/icwsm14.vader.hutto.pdf\" target=\"_blank\">here<\/a>.<\/p>\n\n\n\n<p><strong>Using VADER to analyse TSLA Headlines<\/strong><\/p>\n\n\n\n<p>Here are the steps:<\/p>\n\n\n\n<ol><li>Install the NLTK library<\/li><li>Run the sentiment analysis<\/li><\/ol>\n\n\n\n<p><span style=\"text-decoration: underline;\">Step 1: Install the NLTK library<\/span><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\"NLTK (opens in a new tab)\" href=\"https:\/\/www.nltk.org\/\" target=\"_blank\">NLTK<\/a> stands for Natural Language ToolKit. It is a library that helps us manage and analyse languages.<\/p>\n\n\n\n<p>We need this as the VADER analyser is part of the NLTK library.<\/p>\n\n\n\n<p>You can install it via <a rel=\"noreferrer noopener\" aria-label=\"Anaconda (opens in a new tab)\" href=\"https:\/\/anaconda.org\/conda-forge\/nltk\" target=\"_blank\">Anaconda<\/a> or <a href=\"https:\/\/pypi.org\/project\/nltk\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"Pip (opens in a new tab)\">Pip<\/a>.<\/p>\n\n\n\n<p>Next we need to download the VADER Lexicon. Think of this as additional data required to run our VADER analyser.<\/p>\n\n\n\n<p>Run the code below in your Jupyter Notebook to download the vader_lexicon:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import nltk\nnltk.download('vader_lexicon')<\/code><\/pre>\n\n\n\n<p><span style=\"text-decoration: underline;\">Step 2: Run the sentiment analysis<\/span><\/p>\n\n\n\n<p>It is finally time to run the actual sentiment analysis!<\/p>\n\n\n\n<p>This is the code (it is shorter than you think eh):<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA\n\nresults = &#91;]\n\nfor headline in df1&#91;'Title']:\n    pol_score = SIA().polarity_scores(headline) # run analysis\n    pol_score&#91;'headline'] = headline # add headlines for viewing\n    results.append(pol_score)\n\nresults<\/code><\/pre>\n\n\n\n<p>We use a loop to pass every headline into our analyser. A sentiment score is assigned to each headline.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-1-1-1024x701.png\" alt=\"\" class=\"wp-image-700\" width=\"768\" height=\"526\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-1-1-1024x701.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-1-1-300x205.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-1-1-768x526.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-1-1.png 1492w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/figure><\/div>\n\n\n\n<p> Next, we concatenate this list to our original dataframe. However, we are only interested in the values of the &#8216;compound&#8217; variable.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df1&#91;'Score'] = pd.DataFrame(results)&#91;'compound']<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"331\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-3-1024x331.png\" alt=\"\" class=\"wp-image-707\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-3-1024x331.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-3-300x97.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/vader-sentiment-analysis-3-768x248.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>We did it!<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">4. <strong>Correlate lagged score index against prices<\/strong><\/h4>\n\n\n\n<a name=\"correlation-score-index-against-prices\"><\/a>\n\n\n\n<p>In this section, we want to compare the relationship between the TSLA stock returns and our sentiment score. If there is a significant relationship, then our sentiment scores might have some predictive value.<\/p>\n\n\n\n<p>Here are the next steps:<\/p>\n\n\n\n<ol><li>Aggregate daily sentiment scores<\/li><li>Import TSLA prices and calculate returns<\/li><li>Check relationship between lagged score against returns (daily)<\/li><\/ol>\n\n\n\n<p><strong>Step 1: Aggregate daily sentiment scores<\/strong><\/p>\n\n\n\n<p>We need only one score per day to compare as TSLA daily prices.<\/p>\n\n\n\n<p>However, there might be more than one article per day. In those cases, we combine the scores for all articles to get a daily score.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df2 = df1.groupby(&#91;'New Date']).sum() # creates a daily score by summing the scores of the individual articles in each day<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"334\" height=\"296\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/daily-score-1.png\" alt=\"\" class=\"wp-image-714\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/daily-score-1.png 334w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/daily-score-1-300x266.png 300w\" sizes=\"(max-width: 334px) 100vw, 334px\" \/><\/figure><\/div>\n\n\n\n<p>The df.groupby() method will remove columns that it deems unnecessary. The output will be the date (as your index) and the daily scores.<\/p>\n\n\n\n<p><strong>Step 2: Import TSLA prices and calculate returns<\/strong><\/p>\n\n\n\n<p>The goal in this step is to get the daily returns (not stock prices) of TSLA.<\/p>\n\n\n\n<p>Now we need to get the stock prices for TSLA. We will get it from Yahoo Finance manually.<\/p>\n\n\n\n<p>Go to Yahoo finance and <a href=\"https:\/\/finance.yahoo.com\/quote\/TSLA\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"search for the TSLA (opens in a new tab)\">search for the TSLA<\/a> stock ticker.<\/p>\n\n\n\n<p>Click on historical data, choose the dates you want and download the data.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"551\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/download-tsla-2-1024x551.png\" alt=\"\" class=\"wp-image-716\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/download-tsla-2-1024x551.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/download-tsla-2-300x162.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/download-tsla-2-768x413.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The data will be downloaded as a CSV.<\/p>\n\n\n\n<p>Alternative, if you are lazy, grab it from our <a rel=\"noreferrer noopener\" aria-label=\"repo (opens in a new tab)\" href=\"https:\/\/github.com\/Lucas170\/Sentiment-Analysis-1-TSLA-Headlines\" target=\"_blank\">repo<\/a>.<\/p>\n\n\n\n<p>Pandas has a convenient method to import CSV files:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\n\n# Make sure csv file is in the same folder as this notebook\ndfEodPrice = pd.read_csv(\"tsla-eod-prices.csv\")<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"568\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-1-1024x568.png\" alt=\"\" class=\"wp-image-717\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-1-1024x568.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-1-300x166.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-1-768x426.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-1.png 1579w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Some of you won&#8217;t know this but the &#8220;Date&#8221; data is in a string format. You can check with the following code:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>type(dfEodPrice&#91;'Date']&#91;1])<\/code><\/pre>\n\n\n\n<p>Thus, we need to convert the &#8220;Date&#8221; column to datetime format.<\/p>\n\n\n\n<p>We shall use another method called <a href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.astype.html\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"pd.astype() (opens in a new tab)\">pd.astype()<\/a> to do this.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dfEodPrice&#91;'Date'] = dfEodPrice&#91;'Date'].astype('datetime64&#91;ns]') <\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"169\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/astype-1-1024x169.png\" alt=\"\" class=\"wp-image-746\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/astype-1-1024x169.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/astype-1-300x50.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/astype-1-768x127.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/astype-1.png 1417w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>This code will change the entire &#8220;Date&#8221; column to a datetime format.<\/p>\n\n\n\n<p>Next, since we are only interested in the &#8220;Adj Close&#8221; column in this article so let&#8217;s drop all unwanted rows. Next, we set our &#8220;Date&#8221; column as our index so that it is easier to manage.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dfEodPrice2 = dfEodPrice.drop(&#91;'Open', 'High','Low','Close','Volume'], axis=1) # drop unwanted rows\ndfEodPrice2.set_index('Date', inplace=True) # set Date coloumn as index<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"448\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-2-1-1024x448.png\" alt=\"\" class=\"wp-image-722\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-2-1-1024x448.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-2-1-300x131.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-2-1-768x336.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-prices-2-1.png 1693w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Now that we have our prices, we need to calculate our returns.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dfEodPrice2&#91;'Returns'] = dfEodPrice2&#91;'Adj Close']\/dfEodPrice2&#91;'Adj Close'].shift(1) - 1 # calculate daily returns<\/code><\/pre>\n\n\n\n<p>To calculate daily returns, we divide today&#8217;s prices by yesterday&#8217;s.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"377\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-returns-1-1024x377.png\" alt=\"\" class=\"wp-image-726\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-returns-1-1024x377.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-returns-1-300x111.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-returns-1-768x283.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-returns-1.png 1892w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><strong>Step 3: Check relationship between lagged score against returns (daily)<\/strong><\/p>\n\n\n\n<p>The goal in this step is to check if the sentiment score predicts future stocks returns.<\/p>\n\n\n\n<p>To do that, we check the relationship between the <em>one-day lagged<\/em> sentiment score and TSLA returns using simple regression.<\/p>\n\n\n\n<p>A one-day lagged sentiment score allows us to compare today&#8217;s article headlines to tomorrow&#8217;s stock returns.<\/p>\n\n\n\n<p>This is an important point as we need our score index to predict the future, not to tell us what is happening in the present.<\/p>\n\n\n\n<p>Of course, we can argue that the headline might have an immediate impact on stock prices. To test that, we need accurate price data on a minute or even second timeframe.<\/p>\n\n\n\n<p>We won&#8217;t do that in this article because it is more difficult to set up that test, minute and second price data is expensive and sometimes inaccurate, there are a lot of variables in live trading (liquidity, spread etc) that may not allow us to enter our trades at the prices stated etc.<\/p>\n\n\n\n<p>Alright, let&#8217;s start the analysis. Here are the steps:<\/p>\n\n\n\n<ol><li>Lagged the sentiment score<\/li><li>Match the daily returns with the lagged sentiment score<\/li><li>Clean the data (again)<\/li><li>Design the test<\/li><li>Test for predictive value<\/li><\/ol>\n\n\n\n<p><em>Step 1: Lagged the sentiment score<\/em><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>df2&#91;'Score(1)'] = df2.shift(1)<\/code><\/pre>\n\n\n\n<p>This code shifts all the data down by one row.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/daily-score-1-1.png\" alt=\"\" class=\"wp-image-756\" width=\"349\" height=\"348\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/daily-score-1-1.png 698w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/daily-score-1-1-150x150.png 150w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/daily-score-1-1-300x300.png 300w\" sizes=\"(max-width: 349px) 100vw, 349px\" \/><\/figure><\/div>\n\n\n\n<p>Wait are we shifting it down? Shouldn&#8217;t it be up? It&#8217;s actually down.<\/p>\n\n\n\n<p>Here is how to think about it. E.g. on 2018-01-16, the lagged score is 0.5719. When we run a regression of 0.5719 against the TSLA&#8217;s 2018-01-16 returns, we are in fact checking the 2018-01-15&#8217;s score against 2018-01-16&#8217;s returns.<\/p>\n\n\n\n<p>This is what we want. Older date&#8217;s score vs future returns.<\/p>\n\n\n\n<p><em>Step 2: Match the daily returns with the lagged sentiment score<\/em><\/p>\n\n\n\n<p>The number of rows of our score index is not the same as the number of rows of our returns.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/number-of-rows-dataframes.png\" alt=\"\" class=\"wp-image-739\" width=\"241\" height=\"190\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/number-of-rows-dataframes.png 481w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/number-of-rows-dataframes-300x237.png 300w\" sizes=\"(max-width: 241px) 100vw, 241px\" \/><figcaption>shape[0] returns the number of rows<\/figcaption><\/figure><\/div>\n\n\n\n<p class=\"has-text-align-left\">This happens as there are some trading days where there isn&#8217;t any news.<\/p>\n\n\n\n<p>Thus, we need to match the daily returns against the corresponding sentiment scores before we can run the regression.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dfEodPrice3 = pd.merge(dfEodPrice2&#91;&#91;'Returns']], df2&#91;&#91;'Score(1)']], left_index=True, right_index=True, how='left')<\/code><\/pre>\n\n\n\n<p>We use the <a rel=\"noreferrer noopener\" aria-label=\"pd.merge() (opens in a new tab)\" href=\"https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.merge.html\" target=\"_blank\">pd.merge()<\/a> for this purpose. The above code will create a new dateframe that uses TSLA returns as reference and pull the appropriate lagged sentiment score for it. <\/p>\n\n\n\n<p>Think of this as a more complicated version of &#8220;vlookup&#8221; in Excel, but it does the same thing.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"597\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/merged-1-1024x597.png\" alt=\"\" class=\"wp-image-749\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/merged-1-1024x597.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/merged-1-300x175.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/merged-1-768x448.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/merged-1.png 1509w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>You can learn more about the pd.merge() method <a rel=\"noreferrer noopener\" aria-label=\"here (opens in a new tab)\" href=\"https:\/\/towardsdatascience.com\/why-and-how-to-use-merge-with-pandas-in-python-548600f7e738\" target=\"_blank\">here<\/a> and <a href=\"https:\/\/thispointer.com\/pandas-how-to-merge-dataframes-using-dataframe-merge-in-python-part-1\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"here (opens in a new tab)\">here<\/a>.<\/p>\n\n\n\n<p><em>Step 3: Clean the data<\/em> (again)<\/p>\n\n\n\n<p>On days where there is no news, there are no sentiment scores. The score column will show a NaN (not-a-number) when there are no scores.<\/p>\n\n\n\n<p>Having a NaN is the equivalent of having a score of 0. Thus, we replace all NaNs with 0.<\/p>\n\n\n\n<p>We do it using this code:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dfReturnsScore.fillna(0, inplace=True) \n# replace NaN with 0 permanently<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/returns-score-1.png\" alt=\"\" class=\"wp-image-759\" width=\"432\" height=\"358\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/returns-score-1.png 863w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/returns-score-1-300x249.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/returns-score-1-768x637.png 768w\" sizes=\"(max-width: 432px) 100vw, 432px\" \/><\/figure><\/div>\n\n\n\n<p><em>Step 4: Design the test<\/em><\/p>\n\n\n\n<p>The lazy way to run the test is to check the relationship between the daily sentiment scores against TSLA&#8217;s daily returns.<\/p>\n\n\n\n<p>However, in addition to article headlines, there are many factors affecting TSLA&#8217;s stock price.<\/p>\n\n\n\n<p>We will not go in-depth on how to isolate the effect of headlines. For now, let&#8217;s do the bare minimum.<\/p>\n\n\n\n<p>The bare minimum is to exclude the data where the score is 0 or insignificant. <\/p>\n\n\n\n<p>We shall assume that a score of between -0.5 and 0.5 is insignificant for the sake of simplicity. This is an arbitrary figure. You can optimise it in a <a rel=\"noreferrer noopener\" aria-label=\"walk-forward optimisation (opens in a new tab)\" href=\"https:\/\/algotrading101.com\/learn\/what-is-walk-forward-optimization\/\" target=\"_blank\">walk-forward optimisation<\/a> if you want.<\/p>\n\n\n\n<p>By doing this, we have defined our hypothesis as such:<\/p>\n\n\n\n<p><em>A sentiment score of &gt; 0.5 or &lt; -0.5 has a predictive value on <strong>only<\/strong> tomorrow&#8217;s TSLA daily returns.<\/em><\/p>\n\n\n\n<p>This test doesn&#8217;t test if the score has any longer term effects as we are only comparing today&#8217;s score against tomorrow&#8217;s stock returns.<\/p>\n\n\n\n<p>The code below removes all data where the sentiment score is between -0.5 and 0.5.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dfReturnsScore2 = dfReturnsScore&#91;(dfReturnsScore&#91;'Score(1)'] > 0.5) | (dfReturnsScore&#91;'Score(1)'] &lt; -0.5)]<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"575\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/drop-rows-1-1024x575.png\" alt=\"\" class=\"wp-image-806\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/drop-rows-1-1024x575.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/drop-rows-1-300x169.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/drop-rows-1-768x431.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/drop-rows-1.png 1262w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p><em>Step 5: Test for predictive value <\/em><\/p>\n\n\n\n<p>Finally, our data is cleaned and ready for us. Now we need to test if there is a positive relationship between the lagged sentiment score and the daily returns.<\/p>\n\n\n\n<p>There are a few ways to do this:<\/p>\n\n\n\n<ol><li>Check for correlation <\/li><li>Run a regression<\/li><li>Run a cointegration test like the <a rel=\"noreferrer noopener\" aria-label=\"Augmented Dickey\u2013Fuller test (opens in a new tab)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Augmented_Dickey%E2%80%93Fuller_test\" target=\"_blank\">Augmented Dickey\u2013Fuller test<\/a><\/li><li>Run a <a rel=\"noreferrer noopener\" aria-label=\"hypothesis of means test (opens in a new tab)\" href=\"https:\/\/www.quantopian.com\/lectures\/hypothesis-testing\" target=\"_blank\">hypothesis of means test<\/a> (you need to log in to Quantopian to see this tutorial)<\/li><\/ol>\n\n\n\n<p>Any of the above 4 tests will suffice. The reason being, if we are satisfied with the test results, we still need to test the strategy using a production environment with proper backtesting &#8211; simulating firing of trades, using in and out-of-sample data, accounting for costs and commission, avoiding <a rel=\"noreferrer noopener\" aria-label=\"overfitting  (opens in a new tab)\" href=\"https:\/\/algotrading101.com\/learn\/what-is-overfitting-in-trading\/\" target=\"_blank\">overfitting <\/a>etc.<\/p>\n\n\n\n<p>Thus, you can think of these statistical tests as an early filter to see if we have any potential. <\/p>\n\n\n\n<p>In this article, we shall keep it simple and run a correlation. <\/p>\n\n\n\n<p>Before that, let&#8217;s plot our data and visualise it.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-1-1024x753.png\" alt=\"\" class=\"wp-image-809\" width=\"512\" height=\"377\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-1-1024x753.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-1-300x221.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-1-768x565.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-1.png 1056w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n\n<p>On the x-axis, we have our 1-day lagged sentiment score. On the y-axis we have our daily TSLA returns.<\/p>\n\n\n\n<p>That doesn&#8217;t look so good. We want an upward sloping shape. An upward sloping shape indicates that when Score(1) goes up, the daily returns go up, and vice versa.<\/p>\n\n\n\n<p> Ideally, we want something like this:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-2-1024x758.png\" alt=\"\" class=\"wp-image-812\" width=\"512\" height=\"379\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-2-1024x758.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-2-300x222.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-2-768x569.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/scatterplot-2.png 1260w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n\n<p>Anyways, let&#8217;s run a correlation analysis before we talk about the results. The following code runs a simple correlation calculation.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dfReturnsScore2&#91;'Returns'].corr(dfReturnsScore2&#91;'Score(1)'])<\/code><\/pre>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/corr-1-1024x145.png\" alt=\"\" class=\"wp-image-819\" width=\"512\" height=\"73\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/corr-1-1024x145.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/corr-1-300x43.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/corr-1-768x109.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/corr-1.png 1093w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/figure><\/div>\n\n\n\n<h4 class=\"wp-block-heading\">Evaluating our results<\/h4>\n\n\n\n<p>Our correlation coefficient is 0.044. That&#8217;s pretty close to 0. This means article headlines alone do not have any predictive value for tomorrow&#8217;s stock returns.<\/p>\n\n\n\n<p>To be honest, no surprise here. Markets are getting more sophisticated and we ran an overly simplistic analysis.<\/p>\n\n\n\n<p>But no worries, before we end the article, let&#8217;s look at some improvements we can make to our analysis for real-world trading.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Trading in the Real World &#8211; Improving our Analysis<\/h3>\n\n\n\n<a name=\"sentiment-analysis-real-world-trading\"><\/a>\n\n\n\n<p><strong>2 second lag and long term relationship<\/strong><\/p>\n\n\n\n<p>We can split headlines into 2 types. 1) Sensational ones and 2) fundamentals-related ones.<\/p>\n\n\n\n<p><em>Sensational news<\/em><\/p>\n\n\n\n<p>This refers to news that causes an instant impact. If we are doing this, we should use news headlines instead of analysis headlines. <\/p>\n\n\n\n<p>Since the news have an instant impact, if we use a 1-day lag for this, it will be too slow.<\/p>\n\n\n\n<p>Thus, we are better off using a shorter time delay such as a 2 second lag. But note that data of such low timeframes are expensive and might not be accurate.<\/p>\n\n\n\n<p>The bad news is, even if you managed to run this analysis significantly accurately, you will be slaughtered by high frequency, or even regular quantitative hedge funds in the real world as you are competing on speed of execution.<\/p>\n\n\n\n<p> Don&#8217;t trade on lower timeframes unless you&#8217;re sure you have an edge. <\/p>\n\n\n\n<p><em>Fundamentals-related news<\/em><\/p>\n\n\n\n<p>This type of news has a longer term fundamental effect. Our SeekingAlpha Analysis headlines fall into this category.<\/p>\n\n\n\n<p>A 1-day lag might be too short for the effect to kick in. <\/p>\n\n\n\n<p>In this case, we can create a long term index score and add or subtract from it based on the individual article headlines. In addition, since newer headlines might have more impact, we can lower the weightage for older headlines.<\/p>\n\n\n\n<p>We can then compare the TSLA prices (not returns) against this index.<\/p>\n\n\n\n<p><strong>Sandbox your strategy<\/strong><\/p>\n\n\n\n<p>There is a lot of noise in the market. Many factors affect TSLA stock prices in addition to headlines (though the headlines are supposedly an approximate representative of these other factors).<\/p>\n\n\n\n<p>Trading an asset using only headlines when the asset is bombarded by many other factors is dangerous.<\/p>\n\n\n\n<p>As mentioned before in the earlier part of this article, we can alleviate this problem by hedging it with another asset. We then use relative value of sentiment scores as our predictor.<\/p>\n\n\n\n<p class=\"has-small-font-size\">To read more on sandboxing: <a href=\"https:\/\/algotrading101.com\/learn\/how-to-use-hedging-as-a-trading-strategy\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\"How to use Hedging as a Trading Strategy (opens in a new tab)\">How to use Hedging as a Trading Strategy<\/a><\/p>\n\n\n\n<p><strong>High Impact Dates<\/strong><\/p>\n\n\n\n<p>Don&#8217;t trade on days where other variables have huge impact. <\/p>\n\n\n\n<p>If you know that a President election result is being announced today, your SeekingAlpha&#8217;s Tesla headline is probably not going to have much impact. <\/p>\n\n\n\n<p>If Tesla is announcing their earnings, then non-earnings related articles will not have much impact.<\/p>\n\n\n\n<p>To account for these in your analysis, remove these exogenous high impact dates from your data set.<\/p>\n\n\n\n<p><strong>Accuracy of our sentiment analyser<\/strong><\/p>\n\n\n\n<p>The accuracy of the VADER sentiment analyser is nowhere near perfect. Just by eyeballing the output, you should be able to see this. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"296\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-vader-scores-1-1024x296.png\" alt=\"\" class=\"wp-image-770\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-vader-scores-1-1024x296.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-vader-scores-1-300x87.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-vader-scores-1-768x222.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/tsla-vader-scores-1.png 1963w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>I would say the scores for these 3 headlines are quite off the mark<\/figcaption><\/figure>\n\n\n\n<p>As mentioned earlier, we already know that these sentiment output have huge variance and we rely on large numbers to squeeze out a slightly useful mean output value.<\/p>\n\n\n\n<p>That said, if you want to improve on this, the solution will be to build your own sentiment analyser by training it on the type of data you are testing on. <\/p>\n\n\n\n<p>Eg. if you are using SeekingAlpha&#8217;s headlines, train a lexicon-based analyser that is only based on SeekingAlpha&#8217;s headlines. But be aware that your analyser is overfitted to SeekingAlpha&#8217;s data and will not work well if applied to something different.<\/p>\n\n\n\n<p>The alternative is to wait 10 years for someone to develop a super accurate sentiment analyser (I&#8217;m sure quant funds have already done this) and open source it.<\/p>\n\n\n\n<p><strong>Use delta of the score instead of raw score<\/strong><\/p>\n\n\n\n<p>Think one step ahead. <\/p>\n\n\n\n<p>Compare the sentiment score with what the current expectations are.<\/p>\n\n\n\n<p>If you know that Tesla is viewed very negatively in the markets, a great score will be more impacted.<\/p>\n\n\n\n<p>If Tesla is already viewed optimistically, then a great score is not as impactful.<\/p>\n\n\n\n<p><strong>Look for headlines from more than one sources<\/strong><\/p>\n\n\n\n<p>Currently we have only looked at headline data from SeekingAlpha. It might be safer to procure our data from different sources for different purposes.<\/p>\n\n\n\n<p>For sensational news, you would want headlines from the bigger news channels.<\/p>\n\n\n\n<p>For longer term fundamental articles, you might want to procure them from more legitimate blogs or research firms.<\/p>\n\n\n\n<p>In both cases, you will want a mixture from different sources. This will increase objectivity of the data as some sources tend to be biased.<\/p>\n\n\n\n<p><strong>Complement headline data with other data<\/strong><\/p>\n\n\n\n<p>Headline data is just one aspect. <\/p>\n\n\n\n<p>Combine this data with other alternative data such as satellite\/drone images of Tesla&#8217;s factories, scrape the amount of listings of 2nd hand Tesla cars, activity of their social media etc, to get a better prediction.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Ending Note &#8211; <strong>Truly understand the trade<\/strong><\/h3>\n\n\n\n<a name=\"ending-note-understand-the-trade\"><\/a>\n\n\n\n<p>At the end of the day, you need to truly understand the reason for your trade. <\/p>\n\n\n\n<p>Come up with a hypothesis and test it appropriately. <\/p>\n\n\n\n<p>Isolate the variables you want to test, split your data into in and out-of-sample pieces, watch out for <a rel=\"noreferrer noopener\" aria-label=\"overfitting  (opens in a new tab)\" href=\"https:\/\/algotrading101.com\/learn\/what-is-overfitting-in-trading\/\" target=\"_blank\">overfitting<\/a> or <a rel=\"noreferrer noopener\" aria-label=\"p-hacking (opens in a new tab)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Data_dredging\" target=\"_blank\">p-hacking<\/a>.<\/p>\n\n\n\n<p>Here is an <a href=\"https:\/\/blog.quandl.com\/interview-with-a-quant-part-one\">interview<\/a> on the framework to design trading strategies that I find useful.<\/p>\n\n\n\n<p>Trading is a competitive sport. This article covers some basics for sentiment analysis. Think of it as teaching you how each chess piece moves.<\/p>\n\n\n\n<p>To win in trading, you need to learn strategies to outsmart others, since everyone is trying to outwit one another all the time, you need to be creative and keep innovating to stay in the game.<\/p>\n\n\n\n<p>Trading is a hard way to make money. Good luck!<\/p>\n\n\n\n<p><em>You can download all the code used here:<\/em> <em><a href=\"https:\/\/github.com\/Lucas170\/Sentiment-Analysis-1-TSLA-Headlines\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">Github repo<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_570\" class=\"pvc_stats total_only  \" data-element-id=\"570\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n<p>Sentiment analysis in finance has become commonplace. In many cases, it has become ineffective as many market players understand it and have one-upped this technique. That said, just like machine learning or basic statistical analysis, sentiment analysis is just a tool. It is how we use it that determines its effectiveness. Here are the general [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":791,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[3,2],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Sentiment Analysis with Python - A Beginner&#039;s Guide - AlgoTrading101 Blog<\/title>\n<meta name=\"description\" content=\"Financial sentiment analysis is used to extract insights from news, social media, financial reports and alternative data for investment, trading, risk management, operations in financial institutions, and basically anything finance related.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Sentiment Analysis with Python - A Beginner&#039;s Guide - AlgoTrading101 Blog\" \/>\n<meta property=\"og:description\" content=\"Financial sentiment analysis is used to extract insights from news, social media, financial reports and alternative data for investment, trading, risk management, operations in financial institutions, and basically anything finance related.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/\" \/>\n<meta property=\"og:site_name\" content=\"Quantitative Trading Ideas and Guides - AlgoTrading101 Blog\" \/>\n<meta property=\"article:published_time\" content=\"2019-12-13T11:21:59+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-07-07T21:32:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-3.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1575\" \/>\n\t<meta property=\"og:image:height\" content=\"1432\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Lucas Liew\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Lucas Liew\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"29 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Sentiment Analysis with Python - A Beginner's Guide - AlgoTrading101 Blog","description":"Financial sentiment analysis is used to extract insights from news, social media, financial reports and alternative data for investment, trading, risk management, operations in financial institutions, and basically anything finance related.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/","og_locale":"en_US","og_type":"article","og_title":"Sentiment Analysis with Python - A Beginner's Guide - AlgoTrading101 Blog","og_description":"Financial sentiment analysis is used to extract insights from news, social media, financial reports and alternative data for investment, trading, risk management, operations in financial institutions, and basically anything finance related.","og_url":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/","og_site_name":"Quantitative Trading Ideas and Guides - AlgoTrading101 Blog","article_published_time":"2019-12-13T11:21:59+00:00","article_modified_time":"2020-07-07T21:32:13+00:00","og_image":[{"width":1575,"height":1432,"url":"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2019\/12\/sa-tesla-page-3.png","type":"image\/png"}],"author":"Lucas Liew","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Lucas Liew","Est. reading time":"29 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/#article","isPartOf":{"@id":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/"},"author":{"name":"Lucas Liew","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/person\/16c5231891a13283bf75ad8fe5240e19"},"headline":"Sentiment Analysis with Python &#8211; A Beginner&#8217;s Guide","datePublished":"2019-12-13T11:21:59+00:00","dateModified":"2020-07-07T21:32:13+00:00","mainEntityOfPage":{"@id":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/"},"wordCount":5271,"publisher":{"@id":"https:\/\/algotrading101.com\/learn\/#organization"},"articleSection":["Programming","Trading"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/","url":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/","name":"Sentiment Analysis with Python - A Beginner's Guide - AlgoTrading101 Blog","isPartOf":{"@id":"https:\/\/algotrading101.com\/learn\/#website"},"datePublished":"2019-12-13T11:21:59+00:00","dateModified":"2020-07-07T21:32:13+00:00","description":"Financial sentiment analysis is used to extract insights from news, social media, financial reports and alternative data for investment, trading, risk management, operations in financial institutions, and basically anything finance related.","breadcrumb":{"@id":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/algotrading101.com\/learn\/sentiment-analysis-python-guide\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/algotrading101.com\/learn\/"},{"@type":"ListItem","position":2,"name":"Sentiment Analysis with Python &#8211; A Beginner&#8217;s Guide"}]},{"@type":"WebSite","@id":"https:\/\/algotrading101.com\/learn\/#website","url":"https:\/\/algotrading101.com\/learn\/","name":"Quantitative Trading Ideas and Guides - AlgoTrading101 Blog","description":"Authentic Stories about Algorithmic trading, coding and life.","publisher":{"@id":"https:\/\/algotrading101.com\/learn\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/algotrading101.com\/learn\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/algotrading101.com\/learn\/#organization","name":"AlgoTrading101","url":"https:\/\/algotrading101.com\/learn\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/logo\/image\/","url":"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/11\/AlgoTrading101-Lucas-Liew.jpg","contentUrl":"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/11\/AlgoTrading101-Lucas-Liew.jpg","width":1200,"height":627,"caption":"AlgoTrading101"},"image":{"@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/person\/16c5231891a13283bf75ad8fe5240e19","name":"Lucas Liew","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/a4da931db0e5587125985e7b134de8c1?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/a4da931db0e5587125985e7b134de8c1?s=96&d=mm&r=g","caption":"Lucas Liew"},"description":"Founder at AlgoTrading101","sameAs":["https:\/\/algotrading101.com\/","https:\/\/www.linkedin.com\/in\/lucasliew"],"url":"https:\/\/algotrading101.com\/learn\/author\/learnadmin\/"}]}},"modified_by":"Lucas Liew","_links":{"self":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts\/570"}],"collection":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/comments?post=570"}],"version-history":[{"count":149,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts\/570\/revisions"}],"predecessor-version":[{"id":2821,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts\/570\/revisions\/2821"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/media\/791"}],"wp:attachment":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/media?parent=570"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/categories?post=570"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/tags?post=570"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}