Last Updated on December 8, 2020
Table of Contents
- How do I get started with algorithmic trading?
- Trading Ideas and Strategies
- What kind of strategy should I trade?
- How do I come up with an alternative data strategy?
- How can I find an arbitrage strategy?
- How can I create a pairs trading strategy?
- How can I find the right pair for my strategy?
- How can I come up with a strategy based on technical analysis?
- Strategies to stay clear of when first starting out
- Coding Platforms
- Data Sources and Management
- Backtesting
- Live Trading
- Final Thoughts
How do I get started with algorithmic trading?
This guide will serve as a roadmap to getting your first automated strategy to a live environment.
There are several different and distinct parts to creating a successful strategy and we will discuss these components.
As a general guideline, these are the main steps that we will cover.
- Selecting and creating an appropriate strategy
- Setting up your programming environment
- Gathering and organizing data for your strategy
- Backtesting your strategy
- Deploying your strategy
Along the way, we will point out a few things that have historically led traders off the path.
We will also provide some tips on how to continue pushing forward, even when faced with a difficult task.
What makes algorithmic trading unique is that requires knowledge in multiple areas such as programming, statistics, as well as familiarity with the markets.
It’s common to see someone strong in one area and not the other. Which inevitably leads to delays and getting “stuck” in some tasks.
The important thing to remember is that strategy development can be, and usually is, an ongoing process.
This means you can always find an alternative solution for a difficult task, and then come back and improve on it later. This will ensure that the project continues to push forward.
As an example, say programming is not your strong suit. You can try to find existing code and copy and paste the parts you need with slight modifications.
This ensures the project moves forward while you’re still learning and getting familiar with coding.
If you then decide to rewrite the whole thing from scratch down the road, there is nothing wrong with that. The main thing is to not let any individual task stop you in your journey to becoming a successful algorithmic trader.
Trading Ideas and Strategies
What kind of strategy should I trade?
Before working on strategy development, it is a good idea to take a step back and determine how you want to approach quantitative trading.
There are many different types of strategies that you can trade, we will outline some of the common ones.
Alternative Data – This involves trading based on non-traditional information such as web traffic data, foot traffic data, surveys, credit card transactions and satellite images.
Arbitrage – There are different forms of arbitrage, but at its core, it involves simultaneous buying and selling of assets and or its derivatives. The goal is to take advantage of mispricing and to minimize downside risk.
Pairs Trading – Similar to arbitrage, this involves buying one asset and selling another. The difference in pairs trading there can be two different assets that make up the strategy whereas in arbitrage there is typically only one asset.
Technical Analysis – This method heavily relies on price movements in the past to predict what an asset will do next.
A pairs trade could involve buying a strong stock while selling a weak stock in the same sector. This helps to offset risks that stem from overall market fluctuations.
If you’re interested in learning more about different types of strategies, check out 4 Quantitative Trading Strategies that Work in 2020. It talks about alternative data, trading in obscure or small markets, high-frequency trading, and machine learning.
It’s a good idea to pick a strategy that aligns with your strengths.
For example, if you have a lot of market experience, it might be easy to pick the right stocks for a pairs trading strategy.
On the other hand, if you’re a great coder, you might do well in optimizing your code to get the execution speed required to be successful in certain types of arbitrage.
Aside from your skillset, this is also a good time to assess your risk appetite.
Technical analysis, for example, involves directional trading which can be influenced by various market conditions.
If you’re risk-averse, a better strategy might be pairs trading, where there is a bit more control over the downside and potentially less volatile swings in the volatility of the strategy PnL.
How do I come up with an alternative data strategy?
Alternative data involves looking at data that typical investors don’t consider, or may not have access to.
Here are some examples of alternative data:
Satellite images – These can be used to provide insights into a business. For example, looking at images of a parking lot to determine how many customers a store has. Or checking images of oil tankers to try and calculate supply levels before the official numbers are released.
Credit card information – Credit card companies sell their data and it can show where consumers are spending their money and where they are not. This can provide valuable insights on stocks that cater to the retail public.
Social media posts – Trends on social media can offer an opportunity to invest in companies or products related to the trend.
Social media in particular is popular mainly because it is easy to access. Most platforms offer an API to automate the whole process and most of the time, there is no cost.
Google search results are also easy to come by. If you’d like to see an example of a strategy that uses Google search results, check out our Backtrader article.
Buying Alternative Data
Alternative data has become so popular that there are companies that specialize in it and make it available for sale.
A big advantage of purchasing such data is that there might be fewer investors looking to trade on the information.
The other advantage is that purchased data tends to be direct from the source, and at times more accurate.
Quandl is a popular source for alternative data. The following link will provide an idea of what they offer –https://www.quandl.com/resources?filterBy=alternative-data
Also check out – https://alternativedata.org/data-providers. This website offers a database of various alternative data providers.
How to act on the data is also an important consideration. There was a trader back in 2013 that recognized that the book series The Hunger Games was gaining popularity.
Rather than investing in the book, he invested in Liongate who was looking to turn the book into a movie. At that time, Liongate was not well known and he was able to pick up the stock at a bargain.
As with anything in algorithmic trading, if a lot of people know about it, it may not be profitable. One distinct advantage of buying data is that there might not be a lot of other people using it.
Collecting Alternative Data using scheduled scripts
Once you have secured a data source, the next step is setting up an automated way to collect up to date data.
This is quite easy to do in both Python and even through your OS.
One method is to run a Python script in a loop so that it continuously gathers data. If it only needs to collect data once per day, then its possible to put the script to sleep for 24 hours once it’s done, so that it will do the same again the next day.
Another popular method is by using the operating system to schedule a task.
If you’re running your trading script on a virtual private server, chances are you’ll be running Linux.
Digital Ocean has a good tutorial on using Cron in linux to run scheduled tasks. This can be used to automate running a Python script at selected intervals.
If you’re a Windows user, datatofish.com has a tutorial that utilizes Windows Scheduler to automatically run a Python script.
How can I find an arbitrage strategy?
Arbitrage is attractive as the risk in these strategies is usually minimal.
Minimal, but not zero. Keep in mind there are always things that can go wrong. Maybe a server goes down mid-trade. Or there are execution problems. But aside from these rare cases, there isn’t much risk.
On the flip side, banks and big funds love these types of strategies for exactly this reason, making it difficult for retail investors to get involved.
One way to get around that is to trade in markets these players cannot participate in. This can mean trading in unregulated markets such as Cryptocurrencies.
But remember that most cryptocurrency exchanges, where these types of opportunities are prevalent, are not insured. So if something goes wrong with the exchange, you could lose all of your money.
Another option is to find markets that are not very liquid. An easy way to spot one is to look for assets where the spread between the bid and the ask is very big.
Once you’ve found an asset that shows arbitrage potential, there are a few ways to confirm if there is an opportunity there.
One method is to download historical tick data for the two assets you’re considering. You can then compare how often, and by how much the price deviated between the two.
If you use a charting platform such as Tradingview, there is an easy way to check by creating a synthetic chart. We will show an example of this in the pairs trading section.
Arbitrage strategies often exist because of limitations. A few years back there was a well-known arbitrage opportunity in the cryptocurrency markets. Coins in countries like South Korea and India were trading well above what they were trading for in other parts of the world.
But these countries have strict capital controls that prevent large amounts of money leaving the country. While it was possible to sell coins on one of their exchanges, it was difficult to transfer that money back. This is a big reason for the opportunity to come up, and remain for a long period of time.
There are legal implications there so it’s a good idea to research what triggered the opportunity so that you’re not put in an adverse situation.
In certain parts of the world, automated stock trading is not popular and brokers have not made an API available. These types of markets are great as you’re not competing with many other bots.
But how do you trade these algorithmically? Well, you can create your own API that interfaces with the brokers proprietary software.
This is not an easy task but that’s precisely why others don’t attempt to do it and why it gives an edge.
How can I create a pairs trading strategy?
A pair is simply two assets combined. There are several examples of this in the market place.
In Forex, you can trade EUR/USD which tracks the price of the euro against the dollar. Similarly, you can also trade GBP/USD. But if you want to stay clear of the dollar, you could trade EUR/GBP, which tracks the euro against the British pound.
The same can be seen in cryptocurrencies. You can trade BTCUSD, ETHUSD, or trade BTC vs ETH by trading BTC/ETH.
If you take the price of BTCUSD and divide it by ETHUSD, you will derive the current price of BTC/ETH.
In other words, going long BTC/ETH is exactly the same as going long BTCUSD and short ETHUSDT, in identical dollar amounts.
We can apply this logic to stocks also. If we were to go long Apple (AAPL) and short Microsoft (MSFT), we’ve essentially created a pair – AAPL/MSFT.
Since our AAPL/MSFT pair does not exist in the market place, it is called a synthetic pair.
These types of pairs typically are created by matching dollar amounts. For example, shorting $10,000 worth of AAPL and going long $10,000 worth of MSFT.
But there are more advanced strategies where you can double your exposure in AAPL. This means going short $20,000 worth of AAPl and going long $10,000 worth of MSFT.
This creates a different pair – 2MSFT/AAPL.
Although we use the term pair, these types of trades are not limited to only two assets. Synthetic assets can be composed of multiple assets, all potentially with different sizes as well.
Some brokers support the creation of synthetic pairs in their platform. This makes it easier and quicker to enter and exit the trade. It is also possible to create bracket orders to attach a stop loss and take profit to the synthetic asset.
Different Approaches in Pairs Trading
The reason pairs trading is attractive is that it eliminates typical market fluctuations.
If we are trading AAPL vs MSFT, and the market falls hard, both stocks are likely to fall. Our downside risk to these types of market events is greatly reduced.
Another advantage is that we can try to emulate certain price behaviors via our selection of assets.
If you have access to a charting platform like Tradingview, it is easy to create synthetic pairs and chart them.
Let’s create a synthetic pair in Gold futures. We will check to see how the front-month contract trades against the next contract in front.
In this example, we are checking the Gold futures contract that will expire in September versus the contract set to expire in October.
We can create this in Tradingview by entering the symbol of the first asset – GC1! and dividing it by the second – GC2!.
This is what the chart looks like.
As expected, there aren’t significant fluctuations in the synthetic asset we have created on an hourly chart.
The mean of the synthetic pair is steady. While there are some deviations from the mean, the variance is fairly consistent.
If you have an asset that shows this type of behavior, a steady mean with relatively consistent deviations, it is known as a stationary time series.
In this case, we took two non-stationary time series (the gold futures contracts) and combined them to create a synthetic pair that is stationary.
If you combine two or more non-stationary assets and the result is a stationary time series, then you can say that the two original assets are cointegrated.
In other words, The September Gold futures contract was not stationary. But when combined with the October contract, it became stationary. Therefore, the two contracts are considered to be cointegrated.
Each time there is a small deviation up in our Gold pair, we can short this pair. And each time there is a deviation down, we can buy it.
In theory, the gold pair presented here offers a great strategy. In reality, unfortunately, the profit margin is not large enough for us retail traders after taking into consideration commission costs.
At its core, this is a mean reversion strategy. Although pairs trading can also be used to trade trends. This can be done by taking two similar stocks where one has a history of outperforming the other.
How can I find the right pair for my strategy?
Pair trades often work best when the two assets have similar characteristics.
We will go through a code example where we pick several stocks from the same sector to see if a suitable pairs strategy exists.
Note that stocks don’t need to be in the same sector to create a pairs trade.
For the last year or so, there has been a strong positive relationship between Gold and the S&P 500. Even though the two are very different assets, it is possible to create a pairs trade.
We start by importing the libraries that we will use. We will use the Alpha Vantage library to grab data.
If you’re not familiar with Alpha Vantage, check out our Alpha Vantage Introduction Guide.
import pandas as pd
from alpha_vantage.timeseries import TimeSeries
from time import sleep
app = TimeSeries(output_format='pandas')
Next, we want to grab the list of S&P 500 companies. Wikipedia keeps an up to date list and we can use Pandas to scrape it.
stock_list = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
We now have a DataFrame of all the companies in the S&P 500.
The list can be filtered for bank stocks only as that is the sector we will focus on.
To find these stocks we will filter the GICS Sub Industry column for rows that contain the term Diversified Banks.
banks = stock_list[stock_list['GICS Sub Industry'].str.contains('Diversified Banks')]
Great, we now have a DataFrame of only bank stocks.
We don’t need all that information, the only thing we need are the ticker symbols. So let’s save all the tickers to a list.
tickers = banks.Symbol.to_list()
With our ticker symbols saved in a list, we can iterate through the list and query the Alpha Vantage API for daily price data.
stocks_df = pd.DataFrame()
for ticker in tickers:
alphav_df = app.get_daily_adjusted(ticker)
alphav_df = alphav_df[0]
alphav_df.columns = [i.split(' ')[1] for i in alphav_df.columns]
stocks_df[ticker] = alphav_df['adjusted'].pct_change()
sleep(12)
We will save data for all of the tickers into a DataFrame. Note that we are converting the data to show the daily gain or loss in percentage format using the pct_change()
function from Pandas.
This will provide better accuracy in our next step where we calculate the correlation coefficient.
If you’re not familiar with correlations, our Python Correlation Guide explains this part in more detail.
This is what our stocks DataFrame looks like at this point.
We can create a correlation matrix from the data
stocks_df.corr()
This is what it looks like.
From here we can try and pick out a pair with a strong correlation to see if the relationship is appropriate for pairs trading.
In this case, the matrix is small enough for use to eyeball the best pair. But let’s automate this part too, in case we have to deal with a much larger correlation matrix down the road.
cor = stocks_df.corr()
In the code above, we start by saving the correlation table to a variable. We then want the largest number in the correlation matrix.
We first want to remove any perfect correlations which are denoted by 1. This is simply the coefficient of the stock to itself, not very useful. We can change any instances of 1 to 0.
cor[cor==1] = 0
Now we can determine which coefficient is the largest. The idxmax function will return the largest except it will do so for each column. If we stack the DataFrame first, it will return only the pair with the strongest correlation.
cor.stack().idxmax()
In this case, the code returned a tuple containing BAC and JPM.
Now that we know that BAC and JPM have the strongest correlation in our table, let’s quickly plot a chart to visually confirm it.
%matplotlib inline
bac_jpm = pd.concat([stocks_df.BAC, stocks_df.JPM], axis=1)
bac_jpm.plot()
We’ve created another DataFrame with only the BAC and JPM data. We then used the plot() function from Pandas to display a chart in our Jupyter notebook.
Visually, it looks like a good correlation, just as our correlation coefficient had suggested.
But a good correlation alone does not imply that the two bank stocks are cointegrated.
In fact, correlation and cointegration are two very different things. We are only using correlation to try and narrow down our focus.
There could be a scenario where one stock consistently outperforms another. In that case, the two stocks will still have a fairly high correlation, but the synthetic pair will not be stationary, or cointegrated.
Let’s dig a little deeper to try and find out if this pair is suitable for a cointegration strategy.
The next step is to calculate the spread between BAC and JPM and plot it. After all, that is what we will be trading. If the two stocks are cointegrated, the chart should have some consistency in the variance from its mean.
spread = stocks_df.BAC - stocks_df.JPM
spread.cumsum().plot()
To calculate the spread, we just need to subtract one stock from another. This is because we are using percent change data. If we were using closing prices, we would have divided one by the other to derive at the spread.
The chart above does not show the consistency in variance that our gold futures example showed, but then again, that example was cherry-picked to illustrate the concept of cointegration. It is rare to see that kind of consistency.
There is enough evidence here that the spread is mean-reverting between BAC and JPM. There was a larger than usual dip in July, but overall, the spread seems to continuously revert back to the 0 level. This could be a good candidate for a cointegration strategy.
Using a stats library to find cointegrated pairs
We’ve focused a lot on charts to determine if a pair is cointegrated. Let’s take a look at a more statistical approach.
We can use the statsmodels library to statistically verify if a pair is cointegrated or not. The easiest way to install it is by using pip.
pip install statsmodels
There are several methods to test for cointegration, we will focus on two that are commonly used.
The first method involves the adfuller
method from the statsmodels library. This is based on the Augmented Dickey-Fuller test.
The second is to use the coint
method from statsmodels. This utilizes the Engle-Granger two-step method.
We will start by importing both methods.
from statsmodels.tsa.stattools import coint, adfuller
We will run our spread Dataframe through the adfuller method. This will let us know if our data is stationary or not.
Recall that if two assets combined produce a stationary time series, then those two assets are considered to be cointegrated.
But before running the test, we need to clean up our data. The statsmodels library currently does not handle missing data.
We can use the dropna function from Pandas to delete any empty rows.
spread.dropna(inplace=True)
Next, we will save the results from the adfuller method to a variable called adf_results.
adf_results = adfuller(spread)
The output might seem a bit confusing, so the next part of code cleans it up a bit so it is easier to read.
print(f'ADF Test Statistic: {adf_results[0]}')
print(f'PValue: {adf_results[1]}')
print(f'Number of lags used: {adf_results[2]}')
print(f'Number of observations: {adf_results[3]}')
print('Critical Values:')
for k,v in adf_results[4].items():
print(f'{k}: {v}')
And here are our results.
The first thing we will look at is the P-Value. It should be below a certain threshold. Some discretion can be used to determine exactly what that threshold is.
Commonly used thresholds are either 5% or 1% (0.05 or 0.01). In this case, the P-Value falls well below the threshold.
Next, we will look at the test statistic which is roughly -9.9. We can compare this to the Critical Values provided by the test.
Since our test statistic is less than the 1% critical value of -3.5, we can assume that our data is stationary with 99% certainty.
In other words, this test tells us that we have found a cointegrated pair based on the data we have supplied.
The next test we will run is the coint
test from the statsmodels library.
In the last test, we used the spread DataFrame that we created by subtracting percentage change data from one stock to another.
This test requires the percentage change data from both stocks. We have this available already in our stocks_df
DataFrame.
Once again we will start by removing any empty rows. Then we will save the results from the test to a variable.
stocks_df.dropna(inplace=True)
coint_results = coint(stocks_df.BAC, stocks_df.JPM)
We can then output the results to the screen.
print(f'T-statistic of unit-root test on residuals {coint_results[0]}')
print(f'PValue: {coint_results[1]}')
print(f'Critical Values: {coint_results[2]}')
In this test, we are only focused on the P-Value. Just like the Augmented Dickey-Fuller test, if the value is below a certain threshold, we can consider the data to be cointegrated.
The P-Value, in this case, is well below 0.01 which lets us know that our two stocks are cointegrated.
We now have both visual confirmation from our graphs and statistical confirmation that this strategy is suitable for a cointegration strategy.
However, there is one other important consideration. Recall that ADF test showed us the number of observations was 98.
This means we tested 98 trading days worth of data. This is a rather small sample.
It may very well be that the pair shows cointegration during some periods and not others. We can run further tests on a larger sample to gain more insight.
We can also customize our data to an extent.
For example, let’s say it is determined that this spread moves well beyond the norm during earnings reports and it is not profitable to trade during that time.
We can exclude earnings days from our testing in this case. Just remember to disable the strategy in a live environment when either company reports earnings if you go this route!
Hypothesis Approach (Top-down)
So far, you’ve looked at a bottom-up approach where we scan many assets to find pairs that are cointegrated.
Another way to formulate a pairs trading strategy is to use a top-down approach. This involves starting with a hypothesis. We then look for assets that fit our hypothesis.
You can learn more about the top-down approach here: Pairs Trading – A Real-World Guide
How can I come up with a strategy based on technical analysis?
The difficulty with creating a strategy based on technical analysis is that manual traders often use it as a guide while algorithmic traders try to derive a signal from it.
Take a look at the following charts as an example.
It shows a few well-known tech companies all testing a moving average at the same time.
It’s not all the same timeframe and it’s not all the same moving average. But it is very relevant.
The important thing here is that there are several stocks testing a potential support area, all at the same time. This is called confluence.
A support confluence can be either like the example above, from multiple correlated stocks, or when the same stock tests support from various indicators at the same time.
For example, if a stock’s 100-day moving average and trendline support fall at the exact same spot, it would be considered a confluence of support.
A manual trader would perceive this scenario in either one of two ways. If the trader was already bearish, they might consider a short position on a correlated drop below the moving averages in a few of these instruments.
If the trader was bullish, then a long position might be considered. Or, the trader may wait for further confirmation that the stocks are turning upward.
The underlying theme here is that the trader would have already had some kind of bias before even looking at these charts.
Perhaps that bias came from following intraday movements in the S&P 500 for the past week. Or maybe it’s something fundamental or news related.
A good automated technical strategy might follow the same premise. Meaning that it doesn’t solely rely on technical indicators to make its trading decisions.
This can mean scanning for stocks with unusual upward momentum and then using technical analysis to determine entry points.
Or maybe isolating assets that are in strong directional trends and then using technical analysis to position on corrections.
This works well for trend following strategies. For mean reversion, a few technical indicators can be combined to create a signal to try and increase the edge.
A similar approach can be used when working with candlestick patterns.
The chart above uses a public library in Tradingview to identify bullish engulfing patterns. In this case, it’s spotted three patterns but all three have failed to trigger a reversal.
There are a few things that can improve the probability of a candlestick pattern working out.
One thing to look out for is the size of the bar. If the pattern occurs when bar sizes are average, the pattern is likely not all that important. But if a candlestick pattern occurs when the candle size is say double the average true range (ATR), then it will likely be much more relevant.
Another thing to watch for is a spike in volume. If there is a large spike in volume on the candle that forms a candlestick pattern, it’s more likely to play out.
Check out the bearish engulfing candle in the above example, it is marked with a blue box around it. Oddly enough, the indicator did not pick up on it.
Not only is there a big spike in volume, the candle size is several multiples larger than the prior candles.
Even with additional confirmation, the pattern can still fail. That’s a normal part of trading. But at least the odds should be a bit more in your favor. All it takes is a small edge.
What we’ve tried to do is remove some false signals from candlestick patterns. Each technical analysis tool has different ways of doing so.
If you’re new to technical analysis, there are a lot of courses on the subject, both paid and free.
You can also try talking to an experienced technical analyst. After all, traders usually love talking about trading and are often happy to share some strategies and answer questions.
In that sense, manual traders are the opposite of automated traders who usually like to keep things hush-hush.
Another option is to pay a technical analyst for an hour or two of their time. It could be a quick way of figuring what works and what doesn’t.
We’ve mentioned Tradingview a few times in this guide – it is a great source.
Tradingview allows for the creation of both strategies and custom indicators with an option to publish it to their public library. That’s how we came about the candlestick pattern indicator.
While it might be difficult to find a profitable strategy published publicly, perusing through posted strategies and indicators can inspire other ideas so it’s definitely worth checking it out.
Technical Analysis Summary
To summarize, we believe that TA can be useful when used in addition to other information inputs (qualitative analysis, alternative data, pairs trading elements, macro/company analysis etc).
However, it might be dangerous to solely rely on indicators and price-based analysis as this might lead to overfitting and p-hacking, i.e. seeing chart patterns that do not have value moving forward. Solely using TA might be more profitable 20 years ago but it is much tougher now as the markets are getting more competitive and sophisticated.
Strategies to stay clear of when first starting out
There are a few strategies to stay clear off, at least at the beginning. High-Frequency Trading (HFT) is one of them.
HFT is dominated by large firms that have deep pockets. This makes it hard to compete with them.
An HFT firm might spend five figures a month to locate their server as close to the exchange as possible. They will also invest heavily in hardware, even if it only shaves off a few nanoseconds on their strategy execution.
Some HFT firms have even reported creating their own hardware just to try and improve their speed.
So unless you are creating a strategy that you plan to sell to an HFT firm, or have a large infrastructure budget, it’s best to stay clear of this method.
Another thing to be careful about is strategies that utilize machine learning. This is an area where few have succeeded.
Machine learning has been known to improve parts of a strategy when used correctly. But it’s rare to see a strategy run successfully entirely on machine learning principles.
Coding Platforms
Which IDE should I use?
The first step in coding an algorithm is figuring out where you’re going to write your code.
There are three basic categories to choose from: An IDE, a code editor, or a text editor.
The difference between these options comes down to the features each provides.
You can write code in any text editor. Even the ones built into the operating system like notepad on Windows or nano in Linux.
The advantage is that a text editor is lightweight. The disadvantage is that it lacks features that can help speed up your coding.
Here is an overview of some of the features you can expect in an IDE.
Autocomplete – The IDE software will attempt to guess what you’re going to write. This feature can help speed up your coding time a great deal.
Execution – With an IDE you can run your script straight from the software.
Debugging – Finding bugs can be a lot easier with options to only run a part of the code or being able to view the contents of a variable while the script is running.
Libraries – Most IDE’s have libraries that aim to make coding easier in some form or another.
The main disadvantage to an IDE is that they can be bulky, and often the learning curve is much steeper when getting to know a new IDE.
A code editor falls somewhere in between a text editor and an IDE. They tend to have most of the sought after features, are usually lightweight, but do still have a steeper learning curve.
A big advantage of a code editor is that they can be used with other programming languages if you decide to do that down the road.
Popular IDE’s for Python are Pycharm, Spyder, and IDLE.
Popular code editors include VS Code, Sublime Text, and Atom.
If you’re just starting out, IDLE might be a good option. It is relatively easy to learn and has most of the features required to code efficiently.
For those looking for a feature-packed platform, VS Code is a popular choice. While it is a bit bulky, it has a ton of features. In particular is the ability to code in both an interactive and non-interactive environment.
Interactive vs Non-interactive environments
The typical method for coding is in a non-interactive environment, or otherwise known as script mode.
Python offers an interactive environment that is advantageous to algo traders or anyone that deals with a lot of data.
In interactive mode, you can run sections of your code and get immediate feedback.
Why would you want to do that? Let’s say you’re running some tests on a strategy. Since it involves a lot of data, it takes 30 minutes to run.
Once the test completes, it gives you the results which you want to manipulate further to try and optimize the strategy.
In script mode, every optimization change would require rerunning the code from the beginning. That means waiting another 30 minutes for the first part of the test to complete.
In interactive mode, you can select portions of code you want to run without the need of running the entire script.
Jupyter notebooks is the most popular interactive IDE for Python. It is a web-based application although it is run locally.
One of the cool features of VS Code is that it has built-in support for Jupyter notebooks. That means you can run both your interactive and non-interactive scripts directly from it.
Further, VS Code makes it easy to switch from script mode to interactive mode. Check out the following code.
What it does is read a CSV of historical price data for Tesla’s stock (TSLA) and loads it into a memory.
If we type #%% in the first line, it will make this code interactive.
From here we can either click Run Cell or hit shift + enter to execute the code.
A new interactive window will open to run the code. From this window, there is a toggle bottom to hide or show active variables. If you click on it, it will show the variables in memory. In this case, we have a Pandas DataFrame called df.
We can open this DataFrame and view it in a Data Viewer and even do some basic filtering without having to write any code.
If you’re not familiar with Pandas or DataFrames, we will cover that in more detail later on. The main takeaway here is that an interactive environment can be useful since we will be handling a lot of data.
One pitfall to watch out for is that code editors and IDE’s have a lot of libraries and plug-ins. You could easily spend days customizing your development environment exactly the way you want it.
If you’re just starting, it’s better to pick something simple so that the focus remains on the algo itself. The coding environment can always be changed or customized later down the road.
Which libraries should I use?
Python is known for its extensive libraries and there are a lot of libraries available that can make your life easier.
If you’re not familiar with libraries, these are code packages that can help you in your task.
As an example, say you’re trying to connect to an exchange to buy and sell stocks.
You would have to figure out how to communicate with that exchange to submit your orders. You’ll also have to define how to handle any errors that may come up in communication.
Chances are, someone has already done this. And often, they will publish their code as open-source which you can use to cut down your coding time.
Popular Python Libraries for Trading
Whether you’re looking for a charting library or a machine learning library, chances are there is one available for Python.
While we can’t go over every library out there, we will discuss some of the ones most beneficial and broadly used.
Pandas
The Pandas library falls high on the list. If you’re not familiar with it, think of it as an advanced spreadsheet for Python.
It is essentially the swiss army knife of quantitative trading.
It might be easier to explain what it can’t do, rather than what it can, but here is an attempt.
- Hold time-series data such as historical prices with many options to manipulate the data.
- Built-in plotting functionality
- Read CSV files with a one-line command
- Easily combine multiple datasets.
- Calculate moving averages, correlation coefficients, standard deviation, or other things traders like to compute.
- Scrape table data from websites
- Convert various time formats and track time zone’s
Just like anything with a lot of features, the learning curve for Pandas is steep. But as an algo trader, it is worthwhile getting to know this library.
As a side note, coders often refer to Google or Stack Overflow when they run into a coding problem.
Since Pandas is more than ten years old, this can lead to some outdated answers. It’s a good idea to search by date if using this route as Pandas has seen many revisions. There is often a better way to do it today compared to a few years ago when it comes to Pandas.
Requests
If you’re looking for historical data or trying to submit a trade order, you will have to communicate with an external server somewhere.
The Requests library allows you to communicate with servers in an easy and straightforward way.
While this is a widely used library, you can get away with not using it in algo trading.
As an example, let’s say you want to connect to the Binance server to execute some crypto trades. You decide to use the python-binance library to do this.
The python-binance library will use requests for the to and from communication with the binance servers. In this case, you don’t have to worry about it.
But since so many libraries use this particular module, it’s worth getting to know it. And it gives you the ability to code your projects from scratch without having to rely on an external library.
Using GitHub to find libraries
A good source for libraries is GitHub. You can run a search on GitHub to narrow down what you’re looking for and you can filter by programming language.
You can also try and find strategies coded by other traders. The image below is a search for mean reversion, filtered to show scripts coded in Python.
You can gauge how popular a library is by the number of stars it has.
It might be difficult to find a profitable strategy with this method, but it can give you some insight as to how other coders approach various things and get some ideas flowing.
Advantages of using a library vs not using one.
The big advantage of using a library is that you can get up and running quickly. A lot of the code is already written for you and the error handling is taken care of for you.
But there are several disadvantages.
For starters, not all libraries are the same. As an example, we talked about the Pandas library. This library was first created in 2008 and has been improved on by a team of developers since.
There are libraries out there that might have been created say a month ago, by one person. Needless to say, there could still be some bugs that need to be worked out.
Sometimes developers start on a great library, only to find themselves still working on it several years later when their interest has moved on to other things.
A good thing about GitHub is it will show when code changes were last made. It will also show the number of open issues in the library. This can help you stay clear of libraries that are no longer actively managed.
Lastly, there is a learning curve for each library. As mentioned, you can use a library like requests to access most brokers. You only need to learn it once.
If you use a library specifically for your broker, you will have to relearn another library down the road if you decide to switch brokers.
Data Sources and Management
Where can I get data?
Both historical and live data can be obtained from two main sources, either your broker or an independent data provider.
AlgoTrading101 has published several guides that contain more information as well as code samples.
Independent Providers
Brokers/Exchanges
Most of the above sources offer a free plan while some offer a paid plan as well.
On free plans, sometimes there are limitations as to how much data you can obtain or how often you can submit requests. Data providers may also restrict how far back they will provide data on their free plan, especially for the smaller time frames like 1-minute candles and tick data.
A more important consideration is the quality of the data. Sometimes, on free plans, the data doesn’t match with other providers. Or worse, there is missing data.
Let’s go through an example of a provider with missing data.
from binance.client import Client
import pandas as pd
client = Client()
candles = client.get_historical_klines('BTCUSDT', '15m', '06-01-2020', '07-01-2020')
# delete unwanted data - just keep date, open, high, low, close
for line in candles:
del line[5:]
In the code example above, we are placing a request for historical Bitcoin (BTCUSDT) candles with the Binance exchange.
Binance offers this for free and you don’t even need an API key.
We are using the Python-binance library in this code example. If you’d like to know more about this library or the Binance API, check our Binance Python API guide.
In the first part of the code, we are importing the python-binance library as well as Pandas. We then request historical 15-minute bars starting from June 1 until July 1.
Once we have our candle data returned from Binance, we will create a DataFrame to hold the candles.
df = pd.DataFrame(candles, columns=['date', 'open', 'high', 'low', 'close'])
df.set_index(['date'], inplace=True)
df.index = pd.to_datetime(df.index, unit='ms')
Binance utilizes Unix timestamps, so we’ve converted that into a different format that Pandas recognizes and is easier for us to read.
This is what our DataFrame looks like so far. The data is neatly organized and indexed by date.
Next, we will create another set of timestamps using a function built-in to Pandas.
time_index = pd.date_range('06-01-2020', '07-01-2020', freq='15min')
What the code above does is create timestamps with a 15-minute interval starting from June 1 until July 1 and saves it to a variable called time_index.
Theoretically, our DataFrame index and the newly created time_index should be identical.
len(time_index) == len(df)
However, when comparing the length of both the code returns False. This means they don’t match.
len(time_index) - len(df)
By subtracting the length of the DataFrame from the time_index variable, we can see that there is a difference of 14. In other words, our DataFrame is missing 14 lines of data.
Another way to check if the data matches is to do a resample in Pandas.
Resampling data involves changing your data from one time frame to another. For example, changing 15-minute candles to 1-hour candles.
In this case, we won’t be changing our time frame. This is an unconventional thing to do, but resampling to the same time frame will result in a new DataFrame with each time interval in the series, even if we don’t have data for it.
Therefore, it will not only show us if data is missing, it will also show which data is missing.
resampled_df =df.resample('15min').agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last'})
The code above will create a second DataFrame called resampled_df.
resampled_df[pd.isna(resampled_df.close)]
We can then use the isna function to display lines where there is no data.
As you can see, several bars are missing on June 28.
This happened because Binance had a planned outage on June 28. Nevertheless, the missing data can cause incorrect values when trying to calculate indicators, correlations, or any other type of analysis.
The worst part is that if you perform some form of analysis on this data, Python for the most part won’t throw any errors. So it won’t even be known that analysis values are incorrect.
There are some cases where this method won’t work. For example, if you’re trading an asset with very low volume, there could be periods where no trades were conducted. This is a legitimate reason for having missing bars.
The bottom line is having accurate data is extremely important. In most cases, paid providers will make the extra effort to ensure that their data is accurate and no values are missing.
While there may some reluctance to pay for data, it is worth it. Imagine altering parameters on a profitable strategy because of feedback from a backtest run on bad data – Yikes!
On top of that, the process of fixing bad data is cumbersome and often extremely time-consuming.
What is the best way to store data?
Data can be stored using a simple format such as CSV or something more complicated like a time-series SQL database.
The approach to this depends on how much data you have. The biggest issue with data is that strategies often require large amounts of it.
If you’ve ever tried to open a CSV file that is 1GB in size, you’ll know that it can crash your system.
The benefit of using SQL is that you can query just the parts of the data that you need. It is generally fast and offers an organized way of storing your data.
The downside is that it can take some time to setup if you’re not familiar with SQL. Especially if you plan to use a time-series plugin.
There are some libraries available for Python such as SQLAlchemy that can simplify things. With SQLAlchemy, you don’t necessarily need to know SQL but it is helpful if you do.
But if you’ve never worked with a database before, an easier way is to start out using CSV files. Algo traders often get hung up on this part and spend weeks trying to figure out SQL, which is not productive.
Start with something simple, it can always be upgraded later.
As mentioned in the prior section, data can be resampled, so there is no need to store data for each time frame. Only the lowest time frame that you will be using.
If you find that your file sizes are getting too large, they can be split up by year, or month, or day even.
If you’re using Pandas, storing your DataFrame to a CSV file is as simple as:
df.to_csv('data.csv')
And retrieving it later is as easy as:
df = pd.read_csv('data.csv')
One downside of CSV files is that the file size can sometimes get large and eat up the free space on your hard drive if you have a lot of data. They can also take long to load if your CSV is large in size.
Pandas supports many formats and some of them utilize compression. The table below shows all of the I/O tools available in Pandas.
We’ve done a comparison of some of the more popular methods from the above list. We started with a file that is 100MB in size when saved as a CSV. Then we checked the file size, how long it took to save, and how long it took to read using other file formats.
Type | File Size | Save | Read |
---|---|---|---|
CSV | 100MB | 21.8s | 2.71s |
parquet | 45MB | 949ms | 1.01s |
hdf5 | 107MB | 1.44s | 860ms |
feather | 50MB | 1.45s | 1.36s |
pickle | 95MB | 1.23s | 475ms |
The table above shows various results depending on the format used. The parquet and feather formats produced much smaller file sizes and the read and save speeds were more than twice as fast compared to CSV.
The pickle and hdf5 formats were also fairly fast, but did not save much space in terms of file size. In fact, hdf5 was slightly larger than the CSV format.
These tests will produce various results depending on the system they are run on. Also, we noticed different results with smaller and larger datasets.
The parquet format, for example, produced a similar size file when we doubled the dataset while the size of the CSV file tripled in size.
With smaller datasets, the difference between the formats weren’t as pronounced. So much so that it might not be worthwhile switching to another format.
On balance, we liked the parquet format as an alternative to CSV for it’s combination of producing a smaller file size and its speed.
To use HDF with Pandas, you’ll need to install the tables library. It can be installed by running the following from your command line.
pip install tables
If you plan to use any of the other formats, you will need to install pyarrow
pip install pyarrow
Pyarrow only works with the 64-bit version of Python. Trying to install it on the 32-bit version in Windows will likely produce an error.
How can I create charts and visualizations?
In our pairs trading example we showed how a chart can be created in Pandas using the built-in plot function.
There are many libraries available for plotting charts in Python. The popular ones include Matplotlib, Plotly, and Bokeh.
Matplotlib is the most popular. One thing to note is that matplotlib plans to discontinue candlestick charts. A new library has been created called mplfinance.
The easiest way to install it is by using pip.
pip install mplfinance
Here is an example of plotting a candlestick chart using data from a CSV file.
import mplfinance as mpf
import pandas as pd
df = pd.read_csv('data.csv', index_col='timestamp', parse_dates=True)
%matplotlib inline
mpf.plot(df, type='candle')
And here is the chart that it produced:
Most libraries will support adding indicators to the chart as well as certain overlays.
One thing missing from mplfinance is that it doesn’t have an easy way to move around the chart. You can zoom in and out, but it doesn’t have a cursor in the event you want to highlight a specific bar and check it’s timestamp.
Plotly offers this functionality. It produces charts in HTML format is fairly easy to use.
To create the same chart in Plotly, start by installing via pip.
pip install plotly
The syntax to create a candlestick chart is as follows:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('data.csv', index_col='timestamp', parse_dates=True)
fig = go.Figure(data=[go.Candlestick(x=df.index,
open=df.open,
high=df.high,
low=df.low,
close=df.close)])
fig.show()
By default, plotly will add a slider at the bottom of the chart. It can be removed if not needed.
Both of these packages can do a whole lot more than just chart price data. They can be used to display statistical data like linear regression or tables like correlation coefficient matrices.
Seaborn is also a popular library. It is actually a wrapper for the matplotlib library to enhance visualizations primarily for statistical applications. We have some code examples of the Seaborn library in our Python Correlations guide.
It’s worth mentioning that Tradingview has released an open source library of its charting platfrom. There are few different versions, the basic version is completely free.
It requires some familiarity with JavaScript and HTML and it doesn’t work out of the box with Python. Although a Python integration could easily be made using a library such as Jinja2 which allows you to dynamically change JavaScript and HTML code using Python.
Backtesting
What is backtesting and what’s our goal?
Backtesting involves testing your strategy against historical data to see how it would have performed.
A backtest can reveal a lot about a strategy. It can provide insight into how profitable a strategy might be and point out areas where the strategy can be improved.
While it is natural to focus on profits, a backtest will also show how a strategy performs during bad times and how it handles drawdowns.
It’s normal for a strategy to go through periods where it doesn’t make money. If it doesn’t, it might even be a warning that something is not right.
A common pitfall in backtesting is making modifications to improve the PnL only to find the strategy underperforming once deployed live.
To avoid this, the historical data can be split into two parts. This is known as in-sample data and out of sample data. To learn more about this check out our walk-forward optimization tutorial which further expands on best practices for backtesting.
What is the best backtesting platform?
There are several platforms out there and each has its own strengths and weaknesses.
Like many aspects of this process, picking the right backtesting framework comes down to preference to a degree.
This is because the developers that have created backtesting frameworks all have different approaches to the market.
Here is a list of the more popular backtesters:
Take Backtrader for example, which is a popular framework. The library will take a trade on the next bar after a signal is received.
Some people don’t see an issue with that while others will argue the trade needs to be taken at the close of the bar that produced the signal. Further, there is no option to change or set a custom price for when the trade is entered.
For certain strategies, these smaller aspects are important and some additional due diligence is needed to find the right framework.
But having said that, for most strategies, Backtrader is a good option.
If you’re interested to learn more about Backtrader, check out our guide – Backtrader for Backtesting.
Zipline is also a popular library. Perhaps more so in the past than it is now. Part of the reason is because Quantopian, the company that created Zipline, was funding traders that produced good strategies.
However, earlier this year Quantopian announced that their user strategies were underperforming and stopped investing money into them.
Even though it’s not created in Python, it’s worth mentioning that Tradingview offers a robust platform.
It utilizes a proprietary language called Pine Script which is relatively easy to pick up if you have some experience in another programming language.
The advantage of Pine Script is that it has nearly everything you need built-in like historical data and performance metrics.
It’s often possible to test simple strategies in less than 10 lines of code even. But for more complex strategies, a Python framework is likely better suited for the job.
If you’d like to learn more about Pine Script, check out our Pine Script Guide.
Live Trading
What are some more popular Brokers?
Here is a list of the more popular brokers that allows for automated trading:
- Interactive Brokers
- TD Ameritrade
- Robinhood (Unofficial API)
- TradeStation
- Cryptocurrency exchanges (E.g. Binance API)
- Alpaca (API Guide)
- E*Trade
- MetaTrader based Brokers
How do I pick a broker?
There are a few important considerations when it comes to choosing a broker.
Commissions and Fees
The first thing to look at is the commissions and fees that a broker charges. Commissions add up, especially for active strategies.
If you’ve been trading for a while, you can pull up reports to see how much you’ve paid in fees over the last year. You will probably be surprised at how much your broker receives.
Brokers usually advertise their rates but this can be deceiving. For example, Interactive Brokers has long been known as a reputable broker with low fees. It’s hard to beat their commissions for stock trading.
But what is often overlooked is that they charge various fees monthly fees for data. They also charge interest on margin and Forex trades. And depending on your account size, they may not pay you interest where you’d normally be entitled to receive it.
Also, brokers tend to specialize. Interactive Brokers is known for stocks but their rates for trading currencies are not competitive at all unless you’re trading large sizes.
Similarly, Forex brokers might advertise a really good EUR/USD spread, but their fees for CFD’s or commodities could be unusually high.
API Reliability
Another important consideration is their API or platform reliability.
As an example, TD Ameritrade has had platform problems in the past, leaving traders unable to log in while in the middle of a trade. To make matters worse, this has happened before on days where the markets are more volatile than usual.
It’s also important to look at the type of connectivity the broker offers for algo trading.
These days, some type of WebSocket to stream live data is a must. The broker should also have a REST or FIX API to submit orders.
There are some brokers out there that offer platforms for backtesting which can be beneficial. But in most cases, brokers don’t offer this functionality, leaving the trader to find a platform or library on their own.
Broker Reliability
Check if your broker is regulated. Preferably, it should be regulated by a developed country’s monetary authority.
Look at the reviews of these brokers.
You want brokers that have been around for a while, are in a strong financial position and are used by many semi-professional traders and small funds. Interactive Brokers is a good start.
Asset Variety
And of course, the broker you pick must offer the asset that you want to trade.
Should I deploy my strategy to the cloud?
It’s always a good idea to host your strategy on a virtual private server(VPS). These servers aren’t as prone to things like power failures or internet outages.
Since Python does not require a lot of resources to run a typical strategy, you can get away with getting the smallest server size available.
Costs vary depending on your provider but are typically in the range of $5-$10 per month for the smallest server size available.
Several providers even offer free services, like Amazon AWS which offers certain servers free for one year for new users.
If you’re not quite ready to move to a VPS, a Raspberry Pi or similar device can be a good alternative.
It will still be prone to internet outages or power failures but it is much more practical than running a strategy on your laptop.
Which companies offer the best cloud solutions?
Several companies offer VPS services, a few names that stand out are Amazon AWS, Google Cloud Platform, Microsoft Azure, Digital Ocean, and Linode.
The right platform for your strategy depends on a few factors.
Most servers run on Linux, if you need to run Windows, the only two companies that support it from the list above is Microsoft Azure and Amazon AWS.
If speed is a concern, you’ll want to position your server closest to your broker’s trading server.
There are a few ways to achieve this. You can use a Geo location finder such as https://tools.keycdn.com/geo to get the exact location of the server.
You can also contact your broker directly. In some cases, they may share the name of their service provider. In this case, it is a good idea to use the same one if possible.
How do I plan for outages, API problems, or script errors?
Outages or general issues with automated trading do come up and it’s good to be prepared.
Most brokers will accept orders over the phone. If there is a problem connecting and you need to close some orders, this can come in handy.
It’s a good idea to write down the phone number and any account details needed to place orders over the phone so that it is easily accessible in the event this happens.
Brokers may also have scheduled outages. These are often communicated via email but some brokers use other methods such as Telegram groups.
But not all outages are planned and a trading strategy should have that under consideration.
For example, if you’re doing API queries for data as part of your strategy, it is common to have a bad API call from time to time.
Similarly, WebSockets can disconnect from time to time. Websockets are usually used to stream live data. If it goes down, your strategy could be making decisions based on stale data.
One way around this is to create a variable like last_message
and assign it a timestamp every time a new message comes through the WebSocket.
The strategy can then compare the current time between the last_message timestamp to see if the last data point is fresh.
The same method can be applied to API calls.
Often both data from REST API’s and WebSockets will already have a timestamp associated with it. Checking that timestamp with the current time is another way to achieve the same thing.
Since errors related to API calls rarely happen, it is not so easy to have your Python script ready for when something goes wrong.
Using a try/except block in Python is a good way to handle these types of issues. It will catch any errors but won’t stop your script from running.
The code example below shows how to use a try/except blog and the logging library to log any errors that may come up.
import requests
import logging
logging.basicConfig(filename='example.log', level=logging.DEBUG)
try:
resp = requests.get('https://www.alphavantage.co/q')
resp.raise_for_status()
print(resp.text)
except Exception as e:
logging.debug(e)
The code is attempting to make an API call to Alpha Vantage but the URL is wrong.
When we call the raise_for_status() command, the Python script raises an error that takes us into the except block.
From there, we can log the error to file. In our directory, we should now have an example.log file which will let us know that an API call produced a 404 error.
Alternatively, you can also use an alert system to notify you of any exceptions that come up in the script.
One thing to be mindful of is that strategies often run in infinite loops. A try/except block will prevent the script from terminating. But this could cause undesired operations.
Here is an example to illustrate.
The following script is an oversimplified strategy. It uses the check_signal() function to decide if we should enter a trade or not.
while True:
try:
signal = check_signal()
if signal:
execute_trade()
prit('trade executed!!')
time.sleep(300)
except Exception as e:
logging.debug(e)
Once we enter a trade, the script should wait for 5 minutes before looking for more trades. That’s what the time.sleep(300) does.
After taking a trade, the script should print trade executed!!
The problem is, we spelled print wrong. So the script will throw an exception and go straight to the except block to log the error, skipping the part where we put the script to sleep for 5 minutes.
Since we are running an infinite loop, the script will go start from the top again after logging the error. In other words, it is now stuck in a loop where it is constantly sending buy orders!
Infinite loops are a bit dangerous to begin with and even more care should be taken when adding try/except blocks.
How do I set up an alert system?
Alert systems are useful to pick up any errors that may arise. They can also be used to let you know when your strategy is entering or exiting trades if you’re interested in keeping an eye on the strategy.
There are several messaging APIs that can be used to send alerts directly to your phone. We’ve previously written on how to set up an alert system with Telegram.
If you don’t need an instant notification, but still like to keep track of things, Google Sheets is a good option as well.
The Python gsheets library provides an easy method to utilize the Google Sheets API. That way you can log things such as open or closed trades.
There are other options available as well. The Python smtplib module can be used to send email notifications. If you prefer SMS messages instead, Twilio provides a good solution.
Final Thoughts
We’ve gone through various ways to generate ideas for a strategy and hopefully, you are now armed with the information needed to get your strategy off the ground.
If you have a strategy in mind, but you’re not sure if it will work, test it out! often just getting started on strategy creation will lead to different ideas in the process.
Once you have a strategy up and running, you might feel like you’ve crossed the finish line.
But it doesn’t end there. Strategies can stop being fruitful after some time and so quantitative traders spend a lot of time thinking of new ideas and strategies to add to their basket.
And while the strategy is automated, in a lot of cases it will need to be monitored to ensure everything is working the way it should.
There will also be extraordinary times in the markets where you may want to take your script offline. For example, an upcoming election as the market behavior is likely to change which could lead to unpredictable results.
If you’re trading with a reputable broker, they will often warn you of times when things could get volatile.