What is Overfitting in Trading?

Q: Is overfitting bad?

Yes. The past does not predict the future perfectly, especially in financial markets. Adapting strategies too closely to past data will result in an inflexibility to adapt to the future. Hence, it leads to poor performance in the future.

Last Updated on June 24, 2020

What is overfitting in trading? Overfitting in trading is the process of designing a trading system that adapts so closely to historical data that it becomes ineffective in the future.

Overfitting (AKA curve fitting) your strategy gives you false confidence that your strategy will be profitable. In fact, if you overfit your backtests well enough, you might produce strategies that seemingly make thousands of percent per year.

How do we overfit?

We overfit by adapting our strategies to noise instead of signals.

Signals are useful fundamental information. Noise are distractions that don’t offer useful qualities.

Overfitting in trading — Figure 1: Overfitting data points on a chart

In figure 1, we have 3 charts with the same data. We are trying to create a model that fits the shape of the data. This model will be used to predict future data points.

We can clearly see that the data forms a U shape. This U shape is our signal.

In the leftmost chart, our model is a straight line. This is not representative of the data at all. It will have poor predictive abilities. We describe this model as an underfitted model.

In the rightmost chart, our model intercepts every data point. This model fits the data perfectly. On paper, this seems like the perfect model.

And it is, if we are not using it to predict the future.

You see, unless future data points follow the past perfectly, this model will have very poor predictive value. This is an overfitted model.

In the middle chart, our model describes the general shape of the data points. This model does have some level of error – it does not intercept all the data points.

However, this is fine. We need our models to have a certain degree of error. This means that the model does not rigidly follow the past.

This model captures the signal in the data (signal refers to the U-shaped data points). Thus, it should be able to adapt to minor changes to the data structure in future.

This model is a good fit aka it is robust.

Is overfitting bad?

The short answer is, yes.

The past does not predict the future perfectly, especially in financial markets.

Adapting strategies too closely to past data will result in an inflexibility to adapt to the future. Hence, it leads to poor performance in the future.

How do we reduce overfitting?

Design strategies that exploit fundamental inefficiencies that make sense from a market or economic point of view.

Examples of fundamental inefficiencies:

Tracking live credit card expenditure data to see if Amazon’s sales will be up or down for the quarter.
Gold futures on one exchange is trading at a cheaper price than gold futures from another exchange. Buy the cheaper one and sell it a higher price

Examples of FAKE inefficiencies:

Buying a currency just because a magic number from a technical indicator passes a certain value.
Buying a stock because the prices hit a level based on an arbitrary logic devised by a Mathematician who passed away 800 years ago.

Another way to reduce overfitting is by running out-of-sample optimisations. More info on that this post “What is Walk Forward Optimization”.

Demonstrating Overfitting

Enough theory! Let’s see some action!

Let’s curve fit some stuff on purpose.

In this exercise, we will curve fit a basic trading robot we use in AlgoTrading101 called Belinda.

Disclaimer: This is the WRONG way to conduct your optimisation. Do not try this at home!

Step 1:

We run an optimisation for Belinda by varying 3 variables: sma_short, sma_long and atr_period.

Step 2:

We run the optimisation from 1^st April 2014 to 1^st January 2015.

Step 3:

We find the optimised parameter values aka the parameter values that produce the best objective function.

Step 4:

Using the optimised parameter values, we run a backtest to see the performance and equity curve in detail. We use the same backtest dates as before: 1^st April 2014 to 1^st Jan 2015. We should expect to see a profit of $3,549.18.

Now we test Belinda with the optimised parameter values using data from the future. As mentioned, the future rarely reflects the past perfectly. Thus, we do not expect this backtest to be profitable.

Let’s run the backtest in the future period: 1^st Jan 2015 to 1^st Oct 2015 (the next 9 months after previous period).

What a disaster! (Unsurprisingly)

How to manipulate backtest performance?

Let me show you something called performance manipulation.

Occasionally, you may see some people selling trading robots over the web. They claim that their robots can make 100% returns overnight and manage to produce some backtest performance to demonstrate that.

So, is their robot legitimate? Well, I have never bought such robots so I can’t refute their claim.

However, I might have some insights into how they produced such an “incredible” performance.

Step 1:

Run an optimisation and find the parameter values that gives the best results.

Step 2:

Run a backtest using these parameter values and the same dates as used in the optimisation.

The difference now is that we increase bet 20 times of what we did before!

Look at that performance! Did we just turn $10,000 into $269,086 over 9 months?!

Oops, it is not $269,086. We have just turned $10K into $2.5 million in 9 months!

I’m going to be a zillionaire!!! (And yes, zillionaire is a real word!)

How do we overfit?

Is overfitting bad?

How do we reduce overfitting?

Demonstrating Overfitting

How to manipulate backtest performance?

QuiverQuant – An Introductory Guide to Alternative Data

Build a custom backtester with Python

Blankly – Python Backtesting Guide