
Out-of-Sample Testing: Why Your Backtest Needs It
Here's a scenario I see weekly: a trader optimizes a strategy on all available data, gets spectacular results, and declares the strategy ready for live trading. Two months later, they're scratching their head wondering why real performance doesn't match the backtest. The answer is almost always the same — they never tested on data the strategy hadn't already seen.
Out-of-sample testing is the absolute minimum bar for strategy validation. Not the gold standard — that's walk-forward analysis. OOS is more like the entrance exam. If your strategy can't pass this basic test, it shouldn't get anywhere near real capital.
The Contamination Problem
When you develop a strategy, every decision you make is influenced by the data you're looking at. You chose RSI over MACD because RSI performed better on your data. You set the period to 14 because that looked optimal. You added a volume filter because it improved results on your data.
Every one of these decisions "uses up" information from your dataset. By the time you've made 20 development decisions, your strategy has been sculpted to fit your specific dataset even if you never explicitly "optimized" anything. This is called implicit overfitting, and it's far more common than the explicit parameter-cranking version.
The only cure is untouched data — a clean sample that had zero influence on any development decision.
How to Implement OOS Testing Correctly
Step 1: Split your data before you start developing. This is critical. If you split after development, you've already been influenced by the full dataset. A common and correct approach:
| Portion | Use | Typical Size | When to Touch |
|---|---|---|---|
| Training set | Strategy development & optimization | 60-70% | Freely during development |
| Validation set | Tuning decisions (optional) | 10-15% | Sparingly during development |
| Test set | Final evaluation | 20-30% | Once only — the final exam |
Step 2: Develop your strategy using only the training set. All parameter optimization, indicator selection, filter testing — everything happens on the training data only. Pretend the test set doesn't exist.
Step 3: Use the validation set for intermediate checks. This is optional but valuable. After making significant changes, test on the validation set to check you're not drifting into overfitting. The validation set gets "used up" over time, which is why the final test set remains untouched.
Step 4: Run the test set exactly once. When you're satisfied with your strategy, run it on the test set. Don't modify anything afterward. If the results disappoint you, do not go back and "adjust" the strategy using test set feedback — that contaminates it.
Interpreting OOS Results
The absolute performance on the OOS period matters less than the relative performance compared to in-sample. Here's how to interpret the ratio:
| OOS / IS Ratio | Interpretation |
|---|---|
| > 80% | Excellent — strategy is robust, minimal overfitting |
| 50-80% | Good — some performance degradation but strategy has real edge |
| 30-50% | Concerning — significant overfitting, simplify the strategy |
| < 30% | Failed — strategy is likely overfitted, reject or redesign |
| Negative | Clearly overfitted — the in-sample results were illusory |
Some degradation is expected and normal. Real markets have transaction costs, changing volatility, and evolving microstructure that backtest data may not perfectly capture. A 30-40% degradation is typical for decent strategies. If your OOS performance equals or exceeds IS performance, double-check your implementation — you might have a data leak.
Common OOS Mistakes
Peeking at the test set. The most common and most destructive mistake. If you look at how your strategy performs on the test set, then modify the strategy, then test again — you've just turned your test set into a second training set. Every peek contaminates the sample.
Choosing the split point to favor results. If you try multiple split points and use the one where OOS looks best, you've optimized the split point — which is another form of overfitting. Choose your split before development and commit to it.
OOS period too similar to IS period. If your in-sample period is a bull market and your OOS period is the continuation of the same bull market, you're not testing robustness — you're testing within the same regime. Ideally, the OOS period should contain at least one market condition that differs from the IS period.
Not enough trades in OOS. If your OOS period only contains 15 trades, the results are statistically meaningless regardless of whether they're positive or negative. You need at least 50 trades, preferably 100+, for reliable OOS conclusions.
"In God we trust. All others must bring data — out-of-sample data." — Adapted from W. Edwards Deming. The original quote applies to manufacturing quality, but it's equally valid for strategy quality.
Beyond Simple OOS: Anchored Walk-Forward
A more sophisticated approach is anchored walk-forward: keep the start date fixed but progressively extend the in-sample period, re-optimizing at each step. This gives you multiple OOS tests while maintaining a growing training dataset.
The progression looks like this: Year 1-2 train / Year 3 test → Year 1-3 train / Year 4 test → Year 1-4 train / Year 5 test. Each test period is completely fresh, and the training set grows with each iteration.
This method is less common than rolling walk-forward but has the advantage of never "forgetting" older data that might contain valuable patterns from rare market events.
For the complete validation toolkit, combine OOS testing with anti-overfitting techniques and results interpretation.
Validate your strategies properly. StratBase.ai makes it easy to split data into in-sample and out-of-sample periods and compare performance — ensuring your backtest results aren't just historical artifacts.
FAQ
What is out-of-sample testing?
Holding back data that you never use during development. After optimizing on in-sample data, you test once on the held-out portion to simulate performance on unseen data.
What percentage should be out-of-sample?
Standard is 70/30 (IS/OOS). Some use 60/40. The OOS period needs 50-100+ trades and ideally different market conditions.
Further Reading
About the Author
Quantitative researcher with 8+ years in algorithmic trading and strategy backtesting. Specializes in technical indicator analysis and risk-adjusted performance metrics.
FAQ
What is out-of-sample testing?▾
Out-of-sample (OOS) testing means holding back a portion of historical data that you never use during strategy development. After optimizing your strategy on the in-sample portion, you test it once on the held-out data. This simulates how the strategy would perform on unseen data.
What percentage of data should be out-of-sample?▾
The standard split is 70% in-sample and 30% out-of-sample. Some practitioners use 60/40 for more conservative validation. The out-of-sample period should contain at least 50-100 trades and ideally cover different market conditions than the in-sample period.
Further reading
Related articles
Comments (0)
Loading comments...

