{"id":9276,"date":"2021-05-11T17:22:24","date_gmt":"2021-05-11T17:22:24","guid":{"rendered":"http:\/\/algotrading101.com\/learn\/?p=9276"},"modified":"2022-07-16T17:16:23","modified_gmt":"2022-07-16T17:16:23","slug":"cluster-analysis-guide","status":"publish","type":"post","link":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/","title":{"rendered":"Cluster Analysis &#8211; Machine Learning for Pairs Trading"},"content":{"rendered":"<div class=\"pvc_clear\"><\/div><p id=\"pvc_stats_9276\" class=\"pvc_stats total_only  \" data-element-id=\"9276\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p><div class=\"pvc_clear\"><\/div>\n<h3 class=\"wp-block-heading\">Table of contents:<\/h3>\n\n\n\n<ol><li><a href=\"#what-is-cluster-analysis\">What is Cluster Analysis?<\/a><\/li><li><a href=\"#ca-uml\">Is Cluster Analysis an Unsupervised Machine Learning task?<\/a><\/li><li><a href=\"#ca-finance\">How can Cluster Analysis be used for Finance?<\/a><\/li><li><a href=\"#pairs-trading\">What is the Pairs Trading strategy? <\/a><\/li><li><a href=\"#does-pair-trading-succeed\">Does Pair Trading succeed?<\/a><\/li><li><a href=\"#ml-steps\">What are the main steps of a Machine Learning project?<\/a><\/li><li><a href=\"#stock-data\">Where to find stock data and how to load it?<\/a><\/li><li><a href=\"#explore-stock-data\">How to explore stock data?<\/a><\/li><li><a href=\"#prepare-stock-data\">How to prepare stock data for clustering?<\/a><\/li><li><a href=\"#ml-model\">How to pick a good machine learning model?<\/a><\/li><li><a href=\"#kmeans\">What is k-Means Clustering?<\/a><\/li><li><a href=\"#kmeans-use\">How to use k-Means Clustering for Pairs Trading?<\/a><\/li><li><a href=\"#hierarchical\">What is Hierarchical Clustering?<\/a><\/li><li><a href=\"#hierarchical-use\">How to use Hierarchical Clustering for Pairs Trading?<\/a><\/li><li><a href=\"#affinity\">What is Affinity Propagation Clustering?<\/a><\/li><li><a href=\"#affinity-use\">How to use Affinity Propagation Clustering for Pairs Trading?<\/a><\/li><li><a href=\"#model-evaluation\">How to evaluate and compare clustering models?<\/a><\/li><li><a href=\"#pairs\">How to extract the trading pairs?<\/a><\/li><li><a href=\"#findings\">How to efficiently present your findings?<\/a><\/li><li><a href=\"#common-machine-learning-testing-mistakes\">What are the 3 Common Machine Learning Analysis\/Testing Mistakes?<\/a><\/li><li><a href=\"#full-code\">Full code<\/a><\/li><\/ol>\n\n\n\n<a name=\"what-is-cluster-analysis\">\n\n\n\n<h2 class=\"wp-block-heading\">What is Cluster Analysis?<\/h2>\n\n\n\n<p>Cluster Analysis is a group of methods that are used to classify phenomena into relative groups known as clusters. <\/p>\n\n\n\n<p>Cluster Analysis doesn&#8217;t have any prior information about the groups our features inhabit.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"685\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/1024px-Cluster-2.svg_.jpg\" alt=\"cluster analysis machine learning\" class=\"wp-image-16158\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/1024px-Cluster-2.svg_.jpg 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/1024px-Cluster-2.svg_-300x201.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/1024px-Cluster-2.svg_-768x514.jpg 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>The result of a cluster analysis shown as the coloring of the squares into three clusters.<\/figcaption><\/figure>\n\n\n\n<a name=\"ca-uml\">\n\n\n\n<h2 class=\"wp-block-heading\">Is Cluster Analysis an Unsupervised Machine Learning task?<\/h2>\n\n\n\n<p>Yes. In Unsupervised ML we feed inputs but there aren\u2019t any target outputs. This means that we don\u2019t tell the algorithm what to do and that it needs to figure out some sort of dependence or underlying logic of what to do.<\/p>\n\n\n\n<p>For example, imagine a website where people posted pictures of dogs and cats. We want the said website to classify those pictures into two categories \u201cCats\u201d and \u201cDogs\u201d without us needing to label each picture.<\/p>\n\n\n\n<p>Rather than teaching the clustering model what a cat or dog is, we will just say to the algorithm to group the pictures into two groups based on visual similarity. We will then get the result of two unlabeled groups:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"576\" src=\"http:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/01\/UML-1024x576.png\" alt=\"\" class=\"wp-image-7557\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/01\/UML-1024x576.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/01\/UML-300x169.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/01\/UML-768x432.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/01\/UML.png 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<a name=\"ca-in-finance\">\n\n\n\n<h2 class=\"wp-block-heading\">How can Cluster Analysis be used for Finance?<\/h2>\n\n\n\n<p>When it comes to Finance, Cluster Analysis can easily spot the underlying logic of our dataset without us needing to bang our head trying to figure it out for ourselves.<\/p>\n\n\n\n<p>For example, imagine having the financial data for 200 stocks, and that you want the algorithm to divide them into groups. The algorithm gives us 4 groups (clusters) that we need to examine.<\/p>\n\n\n\n<p>After examination, we conclude that the groups are: \u201cBestselling\u201d, \u201cSelling but mediocre\u201d, \u201cWorst selling\u201d, and \u201cStagnating\u201d.  <\/p>\n\n\n\n<p>In this article, we&#8217;ll explore how Cluster Analysis can help us in creating a Paris Trading Strategy. Let us first remind ourselves what a Pairs Trading strategy is.<\/p>\n\n\n\n<a name=\"pairs-trading\">\n\n\n\n<h2 class=\"wp-block-heading\">What is the Pairs Trading strategy?<\/h2>\n\n\n\n<p>Pairs trading is a strategy in which a trader buys one asset while shorting another. The main premise of the trade is that when the two pairs diverge, they will likely converge again resulting in profit for the trader.<\/p>\n\n\n\n<p>A visual representation of this strategy might help you in understanding it better:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"570\" src=\"http:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/09\/pairs-trading-1024x570.png\" alt=\"Pairs Trading. Chart of 2 prices mean reverting\" class=\"wp-image-4811\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/09\/pairs-trading-1024x570.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/09\/pairs-trading-300x167.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/09\/pairs-trading-768x427.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/09\/pairs-trading-1536x855.png 1536w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/09\/pairs-trading-150x83.png 150w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/09\/pairs-trading.png 1709w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption>Pairs Trading<\/figcaption><\/figure>\n\n\n\n<a name=\"does-pair-trading-succeed\">\n\n\n\n<h2 class=\"wp-block-heading\">Does Pair Trading succeed?<\/h2>\n\n\n\n<p>Pair Trading will work if you choose the right assets to form a pair. Moreover, you should add more parts to your pairs trading strategy like utilizing more pairs or coupling it with sentiment analysis.<\/p>\n\n\n\n<p>As the fundamental building block of this strategy are the pairs we use, we don&#8217;t want to pick unreasonable ones. Finding them by hand might be too time-consuming and you might miss out on the underdogs.<\/p>\n\n\n\n<p>This is where Machine Learning (ML) comes into play. In this article, we will go step-by-step through the pace of solving our problem with ML. We will define the problem as follows:<\/p>\n\n\n\n<p><strong>What trading assets work the best together to form a trading pair for the Pairs Trading strategy?<\/strong><\/p>\n\n\n\n<p>Let&#8217;s begin!<\/p>\n\n\n\n<hr class=\"wp-block-separator has-css-opacity\"\/>\n\n\n\n<p><em>Note that this article explores machine learning statistic methods to find assets that moved similarly historically. Pairs trading has been around for a long time and this strategy is common place among hedge funds and traders. <\/em><\/p>\n\n\n\n<p><em>To succeed with pairs trading, you need market knowledge in addition to the statistical tools that you learnt here.<\/em><\/p>\n\n\n\n<p><em>For more information about implementing pairs trading in real-life, check out the following article: <a href=\"https:\/\/algotrading101.com\/learn\/pairs-trading-guide\/\">https:\/\/algotrading101.com\/learn\/pairs-trading-guide\/<\/a><\/em><\/p>\n\n\n\n<a name=\"ml-steps\">\n\n\n\n<h2 class=\"wp-block-heading\">What are the main steps of a Machine Learning project?<\/h2>\n\n\n\n<p>Before tackling the main problem that we defined, we need to remind ourselves of what the main steps of an ML project are. These steps can often be overlooked by novice practitioners, so be sure to have them in mind:<\/p>\n\n\n\n<ol><li><strong>Define the Problem<\/strong> &#8211; be sure to deeply understand the problem you are trying to solve and elaborate it in a concise and understandable way.<br><\/li><li><strong>Research the Problem<\/strong> &#8211; thoroughly research your problem by exploring if there are any proposed solutions, read papers, communicate with experts, etc.<br><\/li><li><strong>Obtain the Data<\/strong><em> &#8211; <\/em>you can&#8217;t expect your model to perform without obtaining quality data that fits the problem. Remember, garbage in = garbage out.<br><\/li><li><strong>Prepare the Data<\/strong> &#8211; things aren&#8217;t perfect in life as so goes for your data. It might have missing values, wrongly imputed values, undesirable values, and much more. Be sure to clean it!<br><\/li><li><strong>Pick the right Model<\/strong> &#8211; picking the right model is one of the most important steps when trying to solve the problem. Think of how each model works and if it can provide a reasonable solution. <br><br>Is your task to predict a value, cluster, or classify? Should you use supervised, unsupervised or reinforcement learning?<br><\/li><li><strong>Evaluate the Model<\/strong> &#8211; evaluating your model is a no-brainer. You simply need to see how it performs on the main performance metrics like precision and recall.<br><\/li><li><strong>Tune the Model<\/strong> &#8211;  depending on the model you choose and its performance you might want to optimize it by tweaking it with its structure and hyperparameters.<br><\/li><li><strong>Present your findings<\/strong> &#8211; after the model is tuned, you are ready to deploy it and present your findings. This is where your communication skills come into play so be sure to practice them.<\/li><\/ol>\n\n\n\n<p>Have in mind that we covered the main steps, there are even more sub-steps and global steps that might and should arise. Be sure to think both wide and deep.<\/p>\n\n\n\n<a name=\"stock-data\">\n\n\n\n<h2 class=\"wp-block-heading\">Where to find stock data and how to load it?<\/h2>\n\n\n\n<p>Stock data can be easily obtained by using financial data providers like Quandl,  Yahoo Finance, dxFeed, Bloomberg, or by utilizing online brokers like Interactive Brokers, Fidelity Investments, and more.<\/p>\n\n\n\n<p>For this article, we will obtain 3 years&#8217; worth of data for the S&amp;P 500 stock by using Yahoo Finance. For more info on Yahoo Finance check out this article:<\/p>\n\n\n\n<figure class=\"wp-block-embed-wordpress wp-block-embed is-type-wp-embed is-provider-quantitative-trading-ideas-and-guides-algotrading-101-blog\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"wp-embedded-content\" data-secret=\"SbhSSmoNez\"><a href=\"https:\/\/algotrading101.com\/learn\/yahoo-finance-api-guide\/\">Yahoo Finance API &#8211; A Complete Guide<\/a><\/blockquote><iframe class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;Yahoo Finance API &#8211; A Complete Guide&#8221; &#8212; Quantitative Trading Ideas and Guides - AlgoTrading101 Blog\" src=\"https:\/\/algotrading101.com\/learn\/yahoo-finance-api-guide\/embed\/#?secret=LBXqyGhBdn#?secret=SbhSSmoNez\" data-secret=\"SbhSSmoNez\" width=\"500\" height=\"282\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>S&amp;P is a stock market index that measures the stock performance of 500 large US companies. Let us start up our python and check for the number of tickers in the S&amp;P 500 index and print the first five of them.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>#pip install yahoo_fin\nimport yahoo_fin.stock_info as si\n\nsp500_list = si.tickers_sp500()\nprint(&quot;Number of Tickers in S&P 500:&quot;, len(sp500_list))\nsp500_list[0:5]<\/code><\/pre><\/div>\n\n\n\n<pre class=\"wp-block-preformatted\">Number of Tickers in S&amp;P 500: 505\n['A', 'AAL', 'AAP', 'AAPL', 'ABBV']<\/pre>\n\n\n\n<p>Now let us iterate through the list and obtain our data for each of the tickers:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>sp500_historical = {}\nfor ticker in sp500_list:\n    sp500_historical[ticker] = si.get_data(ticker, start_date=&quot;01\/01\/2018&quot;, index_as_date = False, interval=&quot;1d&quot;)\nsp500_historical<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"991\" height=\"484\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1-1.png\" alt=\"\" class=\"wp-image-8176\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1-1.png 991w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1-1-300x147.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1-1-768x375.png 768w\" sizes=\"(max-width: 991px) 100vw, 991px\" \/><\/figure>\n\n\n\n<p>As Yahoo Finance returns a pandas data frame, we have just obtained 505 data frames. Now we need to concatenate them:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>data = pd.concat(sp500_historical)\ndata.reset_index(drop=True, inplace=True)\ndata<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"971\" height=\"532\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2.png\" alt=\"\" class=\"wp-image-8177\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2.png 971w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2-300x164.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2-768x421.png 768w\" sizes=\"(max-width: 971px) 100vw, 971px\" \/><\/figure>\n\n\n\n<p>As you can see, the data is still unusable as all the tickers got grouped into a single column. Moreover, we only need the adjusted closing prices and the date columns. We can sort this out by pivoting the data table as follows:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>data = data.pivot(index=&#39;date&#39;, columns=&#39;ticker&#39;, values = &#39;adjclose&#39;)\ndata.head(5)<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"370\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3-1024x370.png\" alt=\"\" class=\"wp-image-8178\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3-1024x370.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3-300x108.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3-768x277.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3.png 1202w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Perfect! Now we have our data sorted in the required way. Let us go ahead and save it as a CSV file for future use:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>data.to_csv(&#39;S&P500_stock_data&#39;)<\/code><\/pre><\/div>\n\n\n\n<a name=\"explore-stock-data\">\n\n\n\n<h2 class=\"wp-block-heading\">How to explore stock data?<\/h2>\n\n\n\n<p>Stock data can be explored in various ways and the most popular one is by doing an Exploratory Data Analysis which consists of several descriptive statistic methods.<\/p>\n\n\n\n<p>Let&#8217;s just briefly look into some main statistics as we really want to explore the data after the clustering is done. We will call the pandas describe method and set the decimal point to 3:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>pd.set_option(&#39;precision&#39;, 3)\ndata.describe().T.head(10)<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"751\" height=\"478\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4-13.jpg\" alt=\"\" class=\"wp-image-16160\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4-13.jpg 751w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4-13-300x191.jpg 300w\" sizes=\"(max-width: 751px) 100vw, 751px\" \/><\/figure>\n\n\n\n<a name=\"prepare-stock-data\">\n\n\n\n<h2 class=\"wp-block-heading\">How to prepare stock data for clustering?<\/h2>\n\n\n\n<p>The next step is to see if we have any missing values:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>data.isnull().values.any()\n\n=&gt; True<\/code><\/pre><\/div>\n\n\n\n<p>As we have missing data, I&#8217;m interested in how much is missing. Let us use the missingno library to plot the missing values.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import missingno\nmissingno.matrix(data)<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"419\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/5-13-1024x419.jpg\" alt=\"\" class=\"wp-image-16161\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/5-13-1024x419.jpg 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/5-13-300x123.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/5-13-768x314.jpg 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/5-13.jpg 1476w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>As we have many stocks it looks a bit messy but you can still see some huge white lines that represent the missing data. This is a bad thing and we shall remove all the columns with more than 20% of missing data:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>print(&#39;Data Shape before cleaning =&#39;, data.shape)\n\nmissing_percentage = data.isnull().mean().sort_values(ascending=False)\nmissing_percentage.head(10)\ndropped_list = sorted(list(missing_percentage[missing_percentage &gt; 0.2].index))\ndata.drop(labels=dropped_list, axis=1, inplace=True)\n\nprint(&#39;Data Shape after cleaning =&#39;, data.shape)<\/code><\/pre><\/div>\n\n\n\n<pre class=\"wp-block-preformatted\">Data Shape before cleaning = (799, 505)\nData Shape after cleaning = (799, 498)<\/pre>\n\n\n\n<p>We dropped only 7 columns which isn&#8217;t bad. What do we do with columns that have less than 20% of missing data? We can drop the columns or fill in the missing values by zeros, mean of the column, or more.<\/p>\n\n\n\n<p>I&#8217;ll fill the missing values by the last available value in the column:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>data = data.fillna(method=&#39;ffill&#39;)<\/code><\/pre><\/div>\n\n\n\n<p>For our clustering task, we are interested in the volatility and performance of stocks and thus we want to obtain the variance and returns on an annual level. Have in mind that we will take a theoretical year period:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>import numpy as np\n\n#Calculate returns and create a data frame\nreturns = data.pct_change().mean()*266\nreturns = pd.DataFrame(returns)\nreturns.columns = [&#39;returns&#39;]\n\n#Calculate the volatility\nreturns[&#39;volatility&#39;] = data.pct_change().std()*np.sqrt(266)\n\ndata = returns\ndata.head()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"284\" height=\"242\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/7-10.jpg\" alt=\"\" class=\"wp-image-16162\"\/><\/figure>\n\n\n\n<p>If you pay attention to the values you can see that, for example, AAL stock has quite a larger volatility than A stock. If we pass the data like this into our models the higher values would be too noisy for the lower ones.<\/p>\n\n\n\n<p>This would make the algorithm not perform well and to combat it we want to scale the variables (mean = 0, variance = 1) by using the StandardScaler from sklearn.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from sklearn.preprocessing import StandardScaler\n\n#Prepare the scaler\nscale = StandardScaler().fit(data)\n\n#Fit the scaler\nscaled_data = pd.DataFrame(scale.fit_transform(data),columns = data.columns, index = data.index)\nX = scaled_data\nX.head()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"293\" height=\"249\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/55.jpg\" alt=\"\" class=\"wp-image-16163\"\/><\/figure>\n\n\n\n<p>Now we are ready to decide which models to apply to our data.<\/p>\n\n\n\n<a name=\"ml-model\">\n\n\n\n<h2 class=\"wp-block-heading\">How to pick a good machine learning model?<\/h2>\n\n\n\n<p>When choosing a good machine learning model you need to know your data. By knowing your data I mean the distributions, missing values, features, labels, etc.<\/p>\n\n\n\n<p>Moreover, you need to know at least the theory behind each model and how and when it is used. All models have their pros and cons and some will perform better than the others on your dataset.<\/p>\n\n\n\n<p>You should think about your problem in the following way: How can it be solved (prediction, clustering, classification)? Should\/can I use supervised, unsupervised or reinforcement learning? <\/p>\n\n\n\n<p>After you get the main idea of what the problem requires to be solved, you can move on to choose a few models. If you are a beginner you can simply Google the most used models in the category you have chosen.<\/p>\n\n\n\n<p>Be sure to pick at least 3 models and compare their outputs so you can go with the best-performing one.<\/p>\n\n\n\n<p>Now, let&#8217;s apply this to our problem. We have a clustering task that uses the unsupervised learning method and the three models we will choose are:<\/p>\n\n\n\n<ul><li>KMeans Clustering<\/li><li>Hierarchical Clustering<\/li><li>Affinity Propagation Clustering<\/li><\/ul>\n\n\n\n<p>Now we shall go over each of the selected models, apply them to the data, explore their results and compare them to each other. After comparison, we shall pick the best one and extract the clusters.<\/p>\n\n\n\n<a name=\"kmeans\">\n\n\n\n<h2 class=\"wp-block-heading\">What is k-Means Clustering?<\/h2>\n\n\n\n<p>k-Means clustering is an algorithm that utilizes unsupervised learning to find and mark K clusters that are specified in advance. K cluster can be found by using either the silhouette or elbow methods.<\/p>\n\n\n\n<p>The way the k-Means algorithm works can be simply explained in 4 main steps which are the following:<\/p>\n\n\n\n<ol><li>After the user has specified the number of clusters (k) the algorithm randomly maps them to the data points as shown in the picture below:<\/li><\/ol>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1_1-1-1024x576.png\" alt=\"\" class=\"wp-image-8195\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1_1-1-1024x576.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1_1-1-300x169.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1_1-1-768x432.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1_1-1-1536x864.png 1536w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/1_1-1.png 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>2. k clusters are created by associating every observation with the nearest mean, hence the k-means name.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2_1-1024x576.png\" alt=\"\" class=\"wp-image-8196\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2_1-1024x576.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2_1-300x169.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2_1-768x432.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2_1-1536x864.png 1536w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/2_1.png 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>3. The centroids (circles) of each cluster transform into a new mean. You can imagine this by their movement as represented below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3_1-1024x576.png\" alt=\"\" class=\"wp-image-8197\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3_1-1024x576.png 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3_1-300x169.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3_1-768x432.png 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3_1-1536x864.png 1536w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/3_1.png 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>4. The previous two steps are repeated until the model converges on a satisfying solution which may look like the following one:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4_1-1024x576.jpg\" alt=\"\" class=\"wp-image-16164\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4_1-1024x576.jpg 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4_1-300x169.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4_1-768x432.jpg 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4_1-1536x864.jpg 1536w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/4_1.jpg 1920w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<a name=\"kmeans-use\">\n\n\n\n<h2 class=\"wp-block-heading\">How to use k-Means Clustering for Pairs Trading?<\/h2>\n\n\n\n<p>Now that we know the basic idea of how the model works, let&#8217;s obtain the number of k clusters we should use for our Pairs Trading problem. <\/p>\n\n\n\n<p>We shall start with the elbow method that can be summed up in the following way: Iterate through the values of k and calculate the distortion for each value of k, and distortion and inertia for each value of k in the specified range.<\/p>\n\n\n\n<p>Distortion is the average of the squared distances from the center of each cluster, while inertia is the sum of squared distances of each feature to the closest cluster center. <\/p>\n\n\n\n<p>Don&#8217;t worry if this sounds confusing there are great tutorials out there that cover the math behind the algorithm. Let&#8217;s input our libraries and launch the elbow method:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from sklearn.cluster import KMeans\nfrom sklearn import metrics\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\nK = range(1,15)\ndistortions = []\n\n#Fit the method\nfor k in K:\n    kmeans = KMeans(n_clusters = k)\n    kmeans.fit(X)\n    distortions.append(kmeans.inertia_)\n\n#Plot the results\nfig = plt.figure(figsize= (15,5))\nplt.plot(K, distortions, &#39;bx-&#39;)\nplt.xlabel(&#39;Values of K&#39;)\nplt.ylabel(&#39;Distortion&#39;)\nplt.title(&#39;Elbow Method&#39;)\nplt.grid(True)\nplt.show()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"897\" height=\"333\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4.jpg\" alt=\"\" class=\"wp-image-16165\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4.jpg 897w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4-300x111.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4-768x285.jpg 768w\" sizes=\"(max-width: 897px) 100vw, 897px\" \/><\/figure>\n\n\n\n<p>By observing the chart we can conclude that the optimal number of clusters would be somewhere between 5 and 6. If you look at the iterations after 6, you can see that we start obtaining less informative clusters.<\/p>\n\n\n\n<p>If you aren&#8217;t sure about the number of clusters you can use the kneed library that finds the optimal number. Let&#8217;s try it out:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>#pip install kneed\nfrom kneed import KneeLocator\nkl = KneeLocator(K, distortions, curve=&quot;convex&quot;, direction=&quot;decreasing&quot;)\nkl.elbow<\/code><\/pre><\/div>\n\n\n\n<p>Output = 5<\/p>\n\n\n\n<p>The silhouette method works by measuring how a particular instance is similar to the cluster it is put into. The values for this method are in a range between -1 and 1 where the higher values indicate a better match.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from sklearn.metrics import silhouette_score\n\n#For the silhouette method k needs to start from 2\nK = range(2,15)\nsilhouettes = []\n\n#Fit the method\nfor k in K:\n    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10, init=&#39;random&#39;)\n    kmeans.fit(X)\n    silhouettes.append(silhouette_score(X, kmeans.labels_))\n\n#Plot the results\nfig = plt.figure(figsize= (15,5))\nplt.plot(K, silhouettes, &#39;bx-&#39;)\nplt.xlabel(&#39;Values of K&#39;)\nplt.ylabel(&#39;Silhouette score&#39;)\nplt.title(&#39;Silhouette Method&#39;)\nplt.grid(True)\nplt.show()\n\nkl = KneeLocator(K, silhouettes, curve=&quot;convex&quot;, direction=&quot;decreasing&quot;)\nprint(&#39;Suggested number of clusters: &#39;, kl.elbow)<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"399\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/8-12-1024x399.jpg\" alt=\"\" class=\"wp-image-16166\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/8-12-1024x399.jpg 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/8-12-300x117.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/8-12-768x300.jpg 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/8-12.jpg 1374w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Our two methods show a different optimal number of clusters and we will go with the number 6 as the Elbow Method has shown that it should also work. Let us go ahead and build our k-Means algorithm with 6 clusters.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>c = 6\n#Fit the model\nk_means = KMeans(n_clusters=c)\nk_means.fit(X)\nprediction = k_means.predict(X)\n\n#Plot the results\ncentroids = k_means.cluster_centers_\nfig = plt.figure(figsize = (18,10))\nax = fig.add_subplot(111)\nscatter = ax.scatter(X.iloc[:,0],X.iloc[:,1], c=k_means.labels_, cmap=&quot;rainbow&quot;, label = X.index)\nax.set_title(&#39;k-Means Cluster Analysis Results&#39;)\nax.set_xlabel(&#39;Mean Return&#39;)\nax.set_ylabel(&#39;Volatility&#39;)\nplt.colorbar(scatter)\nplt.plot(centroids[:,0],centroids[:,1],&#39;sg&#39;,markersize=10)\nplt.show()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"944\" height=\"604\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/9.png\" alt=\"\" class=\"wp-image-8201\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/9.png 944w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/9-300x192.png 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/9-768x491.png 768w\" sizes=\"(max-width: 944px) 100vw, 944px\" \/><\/figure>\n\n\n\n<p>Quite interesting! If you look at the orange cluster we can see that it is made out of outliers (volatile stocks with a large mean return). We can either remove the outliers, add them to the blue cluster or leave them be.<\/p>\n\n\n\n<p>In this Pairs Trading scenario, I&#8217;d prefer leaving them so we can know which stocks are these as they would be interesting to explore further. To know how many instances each cluster has we can write the following:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>clustered_series = pd.Series(index=X.index, data=k_means.labels_.flatten())\nclustered_series_all = pd.Series(index=X.index, data=k_means.labels_.flatten())\nclustered_series = clustered_series[clustered_series != -1]\nplt.figure(figsize=(12,8))\nplt.barh(range(len(clustered_series.value_counts())),clustered_series.value_counts())\nplt.title(&#39;Clusters&#39;)\nplt.xlabel(&#39;Stocks per Cluster&#39;)\nplt.ylabel(&#39;Cluster Number&#39;)\nplt.show()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"711\" height=\"496\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/10-13.jpg\" alt=\"\" class=\"wp-image-16167\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/10-13.jpg 711w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/10-13-300x209.jpg 300w\" sizes=\"(max-width: 711px) 100vw, 711px\" \/><\/figure>\n\n\n\n<a name=\"hierarchical\">\n\n\n\n<h2 class=\"wp-block-heading\">What is Hierarchical Clustering?<\/h2>\n\n\n\n<p>Hierarchical Clustering is a method that groups features into clusters based on their similarity. It can perform the groupage by an agglomerative (bottom-up) or divisive (top-down) approach.<\/p>\n\n\n\n<p>The main advantage that hierarchical clustering has is that it doesn&#8217;t require us to specify the number of clusters in advance. <\/p>\n\n\n\n<p>The method performs the clustering by creating a tree of clusters by grouping and separating features on each iteration. The product of the clustering process is visualized in a figure known as &#8220;dendrogram&#8221;.<\/p>\n\n\n\n<a name=\"hierarchical-use\">\n\n\n\n<h2 class=\"wp-block-heading\">How to use Hierarchical Clustering for Pairs Trading?<\/h2>\n\n\n\n<p>When applying Hierarchical Clustering to our Pairs Trading problem we need to know the main scikit-learn methods by which the similarity between our features is measured and those are the following:<\/p>\n\n\n\n<ul><li><strong>Ward linkage<\/strong> &#8211; it works by minimizing the within-cluster variance of the clusters that are in the process of merging.<br><\/li><li><strong>Average linkage<\/strong> &#8211; it calculates the average distance between each data point in two clusters.<br><\/li><li><strong>Complete linkage<\/strong> &#8211; measures the maximum distance between all data points in two clusters.<br><\/li><li><strong>Single linkage<\/strong> &#8211; groups the clusters in a bottom-up way.<\/li><\/ul>\n\n\n\n<p>As we want to minimize the variance distance between our clusters we shall go with Ward&#8217;s linkage. Let us jump into the coding part to calculate the linkage and plot a dendrogram:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from sklearn.cluster import AgglomerativeClustering\nimport scipy.cluster.hierarchy as shc\n\nplt.figure(figsize=(15, 10))  \nplt.title(&quot;Dendrograms&quot;)  \ndend = shc.dendrogram(shc.linkage(X, method=&#39;ward&#39;))<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"872\" height=\"590\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-1-3.jpg\" alt=\"\" class=\"wp-image-16168\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-1-3.jpg 872w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-1-3-300x203.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-1-3-768x520.jpg 768w\" sizes=\"(max-width: 872px) 100vw, 872px\" \/><\/figure>\n\n\n\n<p>Here we can see the dendrogram where the x-axis is represented by our stocks and the y-axis represents the distance between them. The vertical line with maximum distance (blue) shows the cluster threshold. <\/p>\n\n\n\n<p>As we can see, a cut at 13.5 will give us 4 clusters. Allow me to plot that:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>plt.figure(figsize=(15, 10))  \nplt.title(&quot;Dendrogram&quot;)  \ndend = shc.dendrogram(shc.linkage(X, method=&#39;ward&#39;))\nplt.axhline(y=13.5, color=&#39;purple&#39;, linestyle=&#39;--&#39;)<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"872\" height=\"590\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-2-2.jpg\" alt=\"\" class=\"wp-image-16169\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-2-2.jpg 872w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-2-2-300x203.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-2-2-768x520.jpg 768w\" sizes=\"(max-width: 872px) 100vw, 872px\" \/><\/figure>\n\n\n\n<p>Now that we know the number of clusters, we can fit the hierarchical clustering model to our data and obtain a scatter plot where the clustering output instances can be clearly seen.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>#Fit the model\nclusters = 4\nhc = AgglomerativeClustering(n_clusters= clusters, affinity=&#39;euclidean&#39;, linkage=&#39;ward&#39;)\nlabels = hc.fit_predict(X)\n\n#Plot the results\nfig = plt.figure(figsize=(15,10))\nax = fig.add_subplot(111)\nscatter = ax.scatter(X.iloc[:,0], X.iloc[:,1], c=labels, cmap=&#39;rainbow&#39;)\nax.set_title(&#39;Hierarchical Clustering Results&#39;)\nax.set_xlabel(&#39;Mean Return&#39;)\nax.set_ylabel(&#39;Volatility&#39;)\nplt.colorbar(scatter)\nplt.show()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"811\" height=\"604\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-3-1.jpg\" alt=\"\" class=\"wp-image-16170\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-3-1.jpg 811w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-3-1-300x223.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-3-1-768x572.jpg 768w\" sizes=\"(max-width: 811px) 100vw, 811px\" \/><\/figure>\n\n\n\n<p>Great! Now we move onto the last clustering algorithm.<\/p>\n\n\n\n<a name=\"affinity\">\n\n\n\n<h2 class=\"wp-block-heading\">What is Affinity Propagation Clustering?<\/h2>\n\n\n\n<p>Affinity Propagation Clustering is a method that creates clusters by a criterion of how well suited an instance is to be a representative of another one. Moreover, it doesn&#8217;t require a specified number of clusters in advance.<\/p>\n\n\n\n<p>You can imagine this by instances messaging each other on how much they suit one another. After that, an instance that received messages from multiple senders will send back the revised value of attractiveness to each sender.<\/p>\n\n\n\n<p>This messaging will proceed until an agreement is reached. When a sender gets associated with the receiver the receiver will become the exemplar. All data points with the same exemplar will then create a cluster.<\/p>\n\n\n\n<a name=\"affinity-use\">\n\n\n\n<h2 class=\"wp-block-heading\">How to use Affinity Propagation Clustering for Pairs Trading?<\/h2>\n\n\n\n<p>Now that we understand what the Affinity Propagation Clustering model does, we can go ahead an fit it to our data.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from sklearn.cluster import AffinityPropagation\n\n#Fit the model\nap = AffinityPropagation()\nap.fit(X)\nlabels1 = ap.predict(X)\n\n#Plot the results\nfig = plt.figure(figsize=(15,10))\nax = fig.add_subplot(111)\nscatter = ax.scatter(X.iloc[:,0], X.iloc[:,1], c=labels1, cmap=&#39;rainbow&#39;)\nax.set_title(&#39;Affinity Propagation Clustering Results&#39;)\nax.set_xlabel(&#39;Mean Return&#39;)\nax.set_ylabel(&#39;Volatility&#39;)\nplt.colorbar(scatter)\nplt.show()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"808\" height=\"604\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4-1.jpg\" alt=\"\" class=\"wp-image-16171\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4-1.jpg 808w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4-1-300x224.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-4-1-768x574.jpg 768w\" sizes=\"(max-width: 808px) 100vw, 808px\" \/><\/figure>\n\n\n\n<p>Wow, that&#8217;s quite a number of clusters. Let&#8217;s obtain their number and arrange them for a better look. We will do this by taking the cluster center indices and labels and plotting them. We shall also transform our data into a NumPy array:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from itertools import cycle\n\n#Extract the cluster centers and labels\ncci = ap.cluster_centers_indices_\nlabels2 = ap.labels_\n\n#Print their number\nclusters = len(cci)\nprint(&#39;The number of clusters is:&#39;,clusters)\n\n#Plot the results\nX_ap = np.asarray(X)\nplt.close(&#39;all&#39;)\nplt.figure(1)\nplt.clf\nfig=plt.figure(figsize=(15,10))\ncolors = cycle(&#39;cmykrgbcmykrgbcmykrgbcmykrgb&#39;)\nfor k, col in zip(range(clusters),colors):\n    cluster_members = labels2 == k\n    cluster_center = X_ap[cci[k]]\n    plt.plot(X_ap[cluster_members, 0], X_ap[cluster_members, 1], col + &#39;.&#39;)\n    plt.plot(cluster_center[0], cluster_center[1], &#39;o&#39;, markerfacecolor=col, markeredgecolor=&#39;k&#39;, markersize=12)\n    for x in X_ap[cluster_members]:\n        plt.plot([cluster_center[0], x[0]], [cluster_center[1], x[1]], col)\n\nplt.show()<\/code><\/pre><\/div>\n\n\n\n<p><br>Estimated number of clusters: 27<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"873\" height=\"575\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-5.jpg\" alt=\"\" class=\"wp-image-16172\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-5.jpg 873w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-5-300x198.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-5-768x506.jpg 768w\" sizes=\"(max-width: 873px) 100vw, 873px\" \/><\/figure>\n\n\n\n<p>For our main problem having a higher number of clusters would make more sense and it would be easier to pick the trading pairs from each of them, but let&#8217;s see what the model comparison shows us.<\/p>\n\n\n\n<a name=\"model-evaluation\">\n\n\n\n<h2 class=\"wp-block-heading\">How to evaluate and compare clustering models?<\/h2>\n\n\n\n<p>As the clustering models are unsupervised, meaning that we don&#8217;t have the labels, we can compare the models by their silhouette score that we introduced earlier in the article.<\/p>\n\n\n\n<p>So let&#8217;s see which method performs the best:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>print(&quot;k-Means Clustering&quot;, metrics.silhouette_score(X, k_means.labels_, metric=&#39;euclidean&#39;))\nprint(&quot;Hierarchical Clustering&quot;, metrics.silhouette_score(X, hc.fit_predict(X), metric=&#39;euclidean&#39;))\nprint(&quot;Affinity Propagation Clustering&quot;, metrics.silhouette_score(X, ap.labels_, metric=&#39;euclidean&#39;))<\/code><\/pre><\/div>\n\n\n\n<pre class=\"wp-block-preformatted\">k-Means Clustering 0.3494916268886619\nHierarchical Clustering 0.3046193567096882\nAffinity Propagation Clustering 0.33752158556435613<\/pre>\n\n\n\n<p>Seems like that the k-Means algorithm performed the best, so let&#8217;s go with it.<\/p>\n\n\n\n<a name=\"pairs\">\n\n\n\n<h2 class=\"wp-block-heading\">How to extract the trading pairs?<\/h2>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>cluster_size_limit = 1000\ncounts = clustered_series.value_counts()\nticker_count = counts[(counts&gt;1) & (counts&lt;=cluster_size_limit)]\nprint (&quot;Number of clusters: %d&quot; % len(ticker_count))\nprint (&quot;Number of Pairs: %d&quot; % (ticker_count*(ticker_count-1)).sum())<\/code><\/pre><\/div>\n\n\n\n<p>In order to extract the trading pairs, we need to check how many trading pairs are there to be evaluated.  The evaluation will perform a statistical analysis to find pairs that are cointegrated.<\/p>\n\n\n\n<p>Pairs are deemed as cointegrated when they aren&#8217;t stationary and tend to move together (recall the Pairs Trading definition from the beginning of the article).<\/p>\n\n\n\n<p>Let&#8217;s set up a function that finds the cointegrated pairs within a cluster. I salvaged this code from the platform known as Quantopian that&#8217;s shutdown and not in use anymore.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>def find_cointegrated_pairs(data, significance=0.05):\n    n = data.shape[1]    \n    score_matrix = np.zeros((n, n))\n    pvalue_matrix = np.ones((n, n))\n    keys = data.keys()\n    pairs = []\n    for i in range(1):\n        for j in range(i+1, n):\n            S1 = data[keys[i]]            \n            S2 = data[keys[j]]\n            result = coint(S1, S2)\n            score = result[0]\n            pvalue = result[1]\n            score_matrix[i, j] = score\n            pvalue_matrix[i, j] = pvalue\n            if pvalue &lt; significance:\n                pairs.append((keys[i], keys[j]))\n    return score_matrix, pvalue_matrix, pairs<\/code><\/pre><\/div>\n\n\n\n<p>Now we shall look for the cointegrated pairs within clusters and return them:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from statsmodels.tsa.stattools import coint\n\ncluster_dict = {}\n\nfor i, clust in enumerate(ticker_count.index):\n    tickers = clustered_series[clustered_series == clust].index\n    score_matrix, pvalue_matrix, pairs = find_coint_pairs(data1[tickers])\n    cluster_dict[clust] = {}\n    cluster_dict[clust][&#39;score_matrix&#39;] = score_matrix\n    cluster_dict[clust][&#39;pvalue_matrix&#39;] = pvalue_matrix\n    cluster_dict[clust][&#39;pairs&#39;] = pairs\n    \npairs = []   \nfor cluster in cluster_dict.keys():\n    pairs.extend(cluster_dict[cluster][&#39;pairs&#39;])\n    \nprint (&quot;Number of pairs:&quot;, len(pairs))\nprint (&quot;In those pairs, we found %d unique tickers.&quot; % len(np.unique(pairs)))\nprint(pairs)<\/code><\/pre><\/div>\n\n\n\n<pre class=\"wp-block-preformatted\">Number of pairs: 20\nIn those pairs, we found 25 unique tickers.<\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;('A', 'AVG0'), ('A', 'CMI'), ('A', 'DHI'), ('A', 'HOLX'), ('A', 'ISRG'), ('A', 'NKE'), ('A', 'ORCL'), ('A', 'TAT'), ('A', 'TMUS'), ('A', 'UNH'), ('ABBV', 'ABC'), ('ABBV', 'JBHT'), ('ABBV', 'NI'), ('AFL', 'HAS'), ('AFL', 'KIM'), ('AAPL', 'ADSK'), ('AAPL', 'CTLT'), ('AAPL', 'QRVO'), ('AAL', 'FANG'), ('AAL', 'UNM')]\r<\/code><\/pre>\n\n\n\n<p>Now that we see our trading pairs, let&#8217;s go ahead and visualize them by using TSNE (t-distributed stochastic neighbor embedding). TSNE is used for visualizing high-dimensional data by giving each instance a location in a 2d or 3d map.<\/p>\n\n\n\n<p>Let&#8217;s import the remaining two libraries and set up a data frame for our trading pairs.<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>from sklearn.manifold import TSNE\nimport matplotlib.cm as cm\n\nstocks = np.unique(pairs)\nX_data = pd.DataFrame(index=X.index, data=X).T\nin_pairs_series = clustered_series.loc[stocks]\nstocks = list(np.unique(pairs))\nX_pairs = X_data.T.loc[stocks]\nX_pairs.head()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"293\" height=\"249\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/55-1.jpg\" alt=\"\" class=\"wp-image-16174\"\/><\/figure>\n\n\n\n<p>Now we are ready to launch the TSNE algorithm and plot the results:<\/p>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>X_tsne = TSNE(learning_rate=30, perplexity=5, random_state=42, n_jobs=-1).fit_transform(X_pairs)\nX_tsne<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"594\" height=\"635\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tsne.jpg\" alt=\"\" class=\"wp-image-16175\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tsne.jpg 594w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tsne-281x300.jpg 281w\" sizes=\"(max-width: 594px) 100vw, 594px\" \/><\/figure>\n\n\n\n<div class=\"hcb_wrap\"><pre class=\"prism line-numbers lang-python\" data-lang=\"Python\"><code>plt.figure(1, facecolor=&#39;white&#39;,figsize=(15,10))\nplt.clf()\nplt.axis(&#39;off&#39;)\nfor pair in pairs:\n    ticker1 = pair[0]\n    loc1 = X_pairs.index.get_loc(pair[0])\n    x1, y1 = X_tsne[loc1, :]\n    ticker2 = pair[0]\n    loc2 = X_pairs.index.get_loc(pair[1])\n    x2, y2 = X_tsne[loc2, :]\n    plt.plot([x1, x2], [y1, y2], &#39;k-&#39;, alpha=0.3, c=&#39;b&#39;);\n    \nplt.scatter(X_tsne[:, 0], X_tsne[:, 1], s=215, alpha=0.8, c=in_pairs_series.values, cmap=cm.Paired)\nplt.title(&#39;TSNE Visualization of Pairs&#39;); \n\n# Join pairs by x and y\nfor x,y,name in zip(X_tsne[:,0],X_tsne[:,1],X_pairs.index):\n\n    label = name\n\n    plt.annotate(label,\n                 (x,y),\n                 textcoords=&quot;offset points&quot;,\n                 xytext=(0,10),\n                 ha=&#39;center&#39;)\n    \nplt.show()<\/code><\/pre><\/div>\n\n\n\n<figure class=\"wp-block-image alignwide size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"851\" height=\"574\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-6.jpg\" alt=\"\" class=\"wp-image-16176\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-6.jpg 851w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-6-300x202.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/download-6-768x518.jpg 768w\" sizes=\"(max-width: 851px) 100vw, 851px\" \/><\/figure>\n\n\n\n<p>When you&#8217;ve obtained the results from your project now is the time to present them to others.<\/p>\n\n\n\n<a name=\"findings\">\n\n\n\n<h2 class=\"wp-block-heading\">How to efficiently present your findings?<\/h2>\n\n\n\n<p>In order to efficiently present your findings you need to go over the main ML project steps and say a few words on each step and what were your ideas for it and what you obtained from each step.<\/p>\n\n\n\n<p>We did that along the way in this article and I hope that you&#8217;ve learned something interesting, and above all, useful. In order to hit the nail on its head, let&#8217;s go to <a href=\"https:\/\/www.tipranks.com\/\">tipranks<\/a> and compare the stocks from the green cluster.<\/p>\n\n\n\n<figure class=\"wp-block-image alignwide size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"741\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tipranks-1024x741.jpg\" alt=\"\" class=\"wp-image-16177\" srcset=\"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tipranks-1024x741.jpg 1024w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tipranks-300x217.jpg 300w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tipranks-768x556.jpg 768w, https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2022\/07\/tipranks.jpg 1081w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Now that you have the statistical tools to find similar assets, check out our article on how to use them in real-world trading:<\/p>\n\n\n\n<p><a href=\"https:\/\/algotrading101.com\/learn\/pairs-trading-guide\/\">https:\/\/algotrading101.com\/learn\/pairs-trading-guide\/<\/a><\/p>\n\n\n\n<a name=\"common-machine-learning-testing-mistakes\">\n\n\n\n<h2 class=\"wp-block-heading\">What are the 3 Common Machine Learning Analysis\/Testing Mistakes?<\/h2>\n\n\n\n<p>When you run your analysis, there are 3 common mistakes to take note:<\/p>\n\n\n\n<ul><li>Overfitting<\/li><li>Look-ahead Bias<\/li><li>P-hacking<\/li><\/ul>\n\n\n\n<p>Do check out this lecture PDF to learn more: <a href=\"https:\/\/course.algotrading101.com\/courses\/pt101-practical-python-for-finance-trading-masterclass\/lectures\/27360454\">3 Big Mistakes of Backtesting &#8211; 1) Overfitting 2) Look-Ahead Bias 3) P-Hacking<\/a><\/p>\n\n\n\n<a name=\"full-code\">\n\n\n\n<h2 class=\"wp-block-heading\">Full code<\/h2>\n\n\n\n<p><a href=\"https:\/\/github.com\/IgorWounds\/Cluster-Analysis-Machine-Learning-for-Pairs-Trading\">GitHub Link<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_9276\" class=\"pvc_stats total_only  \" data-element-id=\"9276\" style=\"\"><i class=\"pvc-stats-icon medium\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/algotrading101.com\/learn\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n<p>Table of contents: What is Cluster Analysis? Is Cluster Analysis an Unsupervised Machine Learning task? How can Cluster Analysis be used for Finance? What is the Pairs Trading strategy? Does Pair Trading succeed? What are the main steps of a Machine Learning project? Where to find stock data and how to load it? How to [&hellip;]<\/p>\n","protected":false},"author":14,"featured_media":8198,"comment_status":"closed","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_lmt_disableupdate":"no","_lmt_disable":"no","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[3,2],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Cluster Analysis - Machine Learning for Pairs Trading - AlgoTrading101 Blog<\/title>\n<meta name=\"description\" content=\"Cluster Analysis is a group of methods that are used to classify phenomena into relative groups known as clusters.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Cluster Analysis - Machine Learning for Pairs Trading - AlgoTrading101 Blog\" \/>\n<meta property=\"og:description\" content=\"Cluster Analysis is a group of methods that are used to classify phenomena into relative groups known as clusters.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/\" \/>\n<meta property=\"og:site_name\" content=\"Quantitative Trading Ideas and Guides - AlgoTrading101 Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-05-11T17:22:24+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-07-16T17:16:23+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/4_1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Igor Radovanovic\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Igor Radovanovic\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"24 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Cluster Analysis - Machine Learning for Pairs Trading - AlgoTrading101 Blog","description":"Cluster Analysis is a group of methods that are used to classify phenomena into relative groups known as clusters.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/","og_locale":"en_US","og_type":"article","og_title":"Cluster Analysis - Machine Learning for Pairs Trading - AlgoTrading101 Blog","og_description":"Cluster Analysis is a group of methods that are used to classify phenomena into relative groups known as clusters.","og_url":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/","og_site_name":"Quantitative Trading Ideas and Guides - AlgoTrading101 Blog","article_published_time":"2021-05-11T17:22:24+00:00","article_modified_time":"2022-07-16T17:16:23+00:00","og_image":[{"width":1920,"height":1080,"url":"http:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2021\/03\/4_1.png","type":"image\/png"}],"author":"Igor Radovanovic","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Igor Radovanovic","Est. reading time":"24 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/#article","isPartOf":{"@id":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/"},"author":{"name":"Igor Radovanovic","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/person\/a7ae60c112a73b7c3fe14ac56726a0ae"},"headline":"Cluster Analysis &#8211; Machine Learning for Pairs Trading","datePublished":"2021-05-11T17:22:24+00:00","dateModified":"2022-07-16T17:16:23+00:00","mainEntityOfPage":{"@id":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/"},"wordCount":3237,"publisher":{"@id":"https:\/\/algotrading101.com\/learn\/#organization"},"articleSection":["Programming","Trading"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/","url":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/","name":"Cluster Analysis - Machine Learning for Pairs Trading - AlgoTrading101 Blog","isPartOf":{"@id":"https:\/\/algotrading101.com\/learn\/#website"},"datePublished":"2021-05-11T17:22:24+00:00","dateModified":"2022-07-16T17:16:23+00:00","description":"Cluster Analysis is a group of methods that are used to classify phenomena into relative groups known as clusters.","breadcrumb":{"@id":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/algotrading101.com\/learn\/cluster-analysis-guide\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/algotrading101.com\/learn\/"},{"@type":"ListItem","position":2,"name":"Cluster Analysis &#8211; Machine Learning for Pairs Trading"}]},{"@type":"WebSite","@id":"https:\/\/algotrading101.com\/learn\/#website","url":"https:\/\/algotrading101.com\/learn\/","name":"Quantitative Trading Ideas and Guides - AlgoTrading101 Blog","description":"Authentic Stories about Algorithmic trading, coding and life.","publisher":{"@id":"https:\/\/algotrading101.com\/learn\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/algotrading101.com\/learn\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/algotrading101.com\/learn\/#organization","name":"AlgoTrading101","url":"https:\/\/algotrading101.com\/learn\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/logo\/image\/","url":"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/11\/AlgoTrading101-Lucas-Liew.jpg","contentUrl":"https:\/\/algotrading101.com\/learn\/wp-content\/uploads\/2020\/11\/AlgoTrading101-Lucas-Liew.jpg","width":1200,"height":627,"caption":"AlgoTrading101"},"image":{"@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/person\/a7ae60c112a73b7c3fe14ac56726a0ae","name":"Igor Radovanovic","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/algotrading101.com\/learn\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/d46175c509b3ee240a1e2bbe735a4d1e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d46175c509b3ee240a1e2bbe735a4d1e?s=96&d=mm&r=g","caption":"Igor Radovanovic"},"sameAs":["https:\/\/igorradovanovic.com","https:\/\/www.linkedin.com\/in\/igor-radovanovic-profile"],"url":"https:\/\/algotrading101.com\/learn\/author\/igor\/"}]}},"modified_by":"Igor Radovanovic","_links":{"self":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts\/9276"}],"collection":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/comments?post=9276"}],"version-history":[{"count":8,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts\/9276\/revisions"}],"predecessor-version":[{"id":16178,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/posts\/9276\/revisions\/16178"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/media\/8198"}],"wp:attachment":[{"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/media?parent=9276"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/categories?post=9276"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/algotrading101.com\/learn\/wp-json\/wp\/v2\/tags?post=9276"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}