Backtested: 16 years, $320k invested. The Magic Formula still has an edge.

Buffett can’t be copied. His edge is intuition built over decades. But what if you could formalize “good business, cheap price” into an algorithm and let it run? That’s what this post is about.

I’ll walk through how I built a Magic Formula value investing pipeline in Python, what I found after a 16-year backtest, and what I’d do differently. Along the way I’ll cover the data engineering side: Medallion Architecture, DuckDB audit trails, and Optuna for parameter optimization. If you’re not a software person, don’t worry. I’ll keep the technical parts light.

The Magic Formula comes from the book “The Little Book That Still Beats the Market.” My first issue with it: I’d rather work from a scientific paper where tables and charts are reproducible. The book is written for intuition, not the mathematical rigour I needed to build a concrete strategy.

LLMs are great for learning, but generating code with them caused a lot of bugs. The model was quietly using proxy values and fallbacks that I only caught after inspecting the full calculation pipeline. Be careful. I recommend splitting the strategy into smaller, self-contained pieces that are easy to test and reason about independently.

1. Value Investing

Some context first. There are different styles of investing. The one used here is value investing. The premise is simple in theory and hard in practice: buy a good business at a cheap price.

Buffett, Peter Lynch, David Dodd, and Benjamin Graham spent their lives turning that sentence into usable frameworks. The sad truth: we can’t use their frameworks. Any method that reliably generates excess returns gets traded away. That is the core claim of the Efficient Market Hypothesis (EMH).

Early value investing focused on book values. It was mostly manual calculation, with the edge coming from doing the work others wouldn’t. As EMH predicts, the edge disappears once enough participants adopt the same method, so new techniques had to be invented. The core assumption stayed the same: good business, cheap price. But the definitions evolved.

The DCF method appeared. You could discount future cash flows, compare them to the current price, and decide if it was a bargain. Layer in qualitative checks, like whether the business has a moat or good management, and you had a workable value investing approach for the modern era.

None of that worked for me though. It all sounds reasonable when you read it, but I couldn’t formalize it into an algorithm without years of experience I don’t have. Buffett spent decades building that intuition. I needed a different proxy.

2. Greenblatt’s Magic Formula - What, Model, Intuition

The Little Book That Still Beats the Market by Joel Greenblatt

The What

The method builds a ranking of stocks and rebalances a portfolio of 20-30 positions every year. Each company’s rank is based on two factors: cheapness and quality, both defined mathematically by the formula. There are also a few exclusions, because the accounting values the formula relies on are meaningless for certain types of businesses.

The method is described in “The Little Book That Still Beats the Market.” It’s short and gives good intuition behind it.

Model

What is cheap: Earnings Yield

Earnings Yield = EBIT / Enterprise Value

EBIT

EBIT is the bottom line just before interest expenses and taxes.

Revenue
- Operating costs
= EBIT                ← stops here
- Interest expense    ← only hits Company B hard
= EBT (pre-tax profit)
- Tax
= Net income

Companies have different financial structures. This formula says we don’t care whether a company funds itself through equity or debt. Using net income instead of EBIT would penalize debt-funded companies unfairly.

Example:

	Company A	Company B
EBIT	£10M	£10M
Interest	£0	£4M
Net income	£10M	£6M

On net income, B looks 40% worse. On EBIT, they’re identical. For the Magic Formula, all other values equal, we treat these two companies as equal.

Enterprise Value (EV)

Two types of valuation: top-down (what the market thinks the company is worth, i.e., EV) and bottom-up (what it would cost to rebuild the company from scratch).

Earnings Yield (EY)

Earnings Yield is EBIT divided by Enterprise Value. It answers: for every dollar you pay to own this business, how much profit does it generate per year?

EY tells us about cost.

Example: if a company’s EY is 10, it means that for every \$1 you invest, the business generates \$10 per year.

What is good: Return on Capital

Good means the business uses its capital efficiently:

Return on Capital = EBIT / (Net Fixed Assets + Net Working Capital)

Net Fixed Assets + Net Working Capital is the minimum capital you would need to restart the business from zero and earn the first dollar.

Return on Capital tells you how much profit is generated for every dollar permanently tied up in assets. Low ROC: sponge. High ROC: machine. Spending a lot on assets to generate a little profit is much worse than using less capital to generate the same profit.

Tweaks

The formula is just a model. As the saying goes: all models are wrong, but some are useful. A few more tweaks make the model less wrong.

Filter out small caps (below \$50M market cap): they are riskier and less liquid. Some business categories rely heavily on interest income rather than operating earnings. For those, EBIT is a poor measure of value, so we filter them out. Greenblatt recommends filtering out utilities and financial stocks. We also focus on US companies: the market is more mature and has historically performed better. These are reasonable choices for backtesting.

Investment horizon: 5-10 years, standard for a value investing strategy.

Intuition

If we buy something good at a cheap price, the odds of profit are higher than with most other approaches. The model defines good and cheap in a simple, algorithmic way, then filters out the obvious cases where the formula breaks down. We want companies with good fundamentals: efficient operations, not heavy spending on assets relative to the profit they generate. By using EBIT, we ignore financing tricks that make a company look better on the bottom line than it actually is.

The biggest advantage of this method is taking your own behavior out of the equation. A fixed rebalance date stops you from torpedoing your own portfolio. More importantly, it defines clear buy and sell signals. This protects against gambler’s ruin: if you blindly keep buying potential winners, even a winning streak can end with a single bet that wipes everything out. Without a defined exit, you have better odds of ending up at zero.

This also gives you systematic exposure to value and quality stocks, which tend to hold up better in volatile conditions. Good companies tend to survive market storms. What makes a company “good” can mean many things: management quality, competitive moat. The great value investors wrote about all of it, but none of them defined it in a way you can run as an algorithm.

3. Data Architecture: Medallion Pipeline for Financial Data

Data pipeline layers — Photo by Kevin Ache on Unsplash

Medallion Architecture

Bronze

We solve this with a Data Engineering best practice: the Medallion Architecture. It breaks the ranking pipeline into 3 layers:

Download raw data and store it in the Bronze layer. We use Sharadar data to eliminate backtesting biases like survivorship bias. Our stock universe is drawn from the Russell 2000 (IWM) holdings.

Silver

This layer contains processed data, standardized column names, and intermediate calculations. The pipeline can start from any layer, which speeds up iteration. If you find a mistake in a calculation, fix it and re-run from Silver. No need to re-download the raw data; it’s already cached in Bronze.

Silver contains all the columns needed to calculate ROC and EY: ebit, market cap, total_debt, cash_and_cash_equivalent, and so on.

Gold

With all components in place, we calculate ROC and EY and produce the ranking. The result goes into the Gold layer: the ordered list the portfolio acts on in live trading.

It contains ey_rank, roc_rank, and combined_rank (their sum). Tickers are sorted by combined_rank. We buy the top 20.

Splitting into 3 layers makes debugging fast. If a value looks wrong, you know which layer produced it and which code to check.

Coding and Data Principles

Do not default to null or 0. If something unexpected happens, fail fast. That forces a conscious choice about what the fallback should be.

On top of fail-fast, I added automatic data quality checks using Pydantic. You define a schema for your values and catch mistakes in the algorithm early.

Single Responsibility and Separation of Concerns are non-negotiable here. Small, focused modules are easy to test and easy to reason about.

Keep the code simple. Simpler code has fewer bugs, by definition. Initially I planned to use Sharadar for backtesting and yfinance for live runs. That turned out to be a mistake. Two code branches meant two sets of assumptions, and Yahoo Finance makes different adjustments than Sharadar. The backtest was testing a different method than what I was using to invest. The fix was obvious: use the same data source for both. Fewer lines of code, more reliable results.

Risk tracker

Any improvements and code changes will require assumptions that might be wrong. I track those in README.md for future review.

Risk 1 and 2: Data quality and bugs in code

Risk 1: the data source has bugs. Risk 2: my code has bugs.

For Risk 1, research showed retail investors are generally satisfied with Sharadar’s quality. Nasdaq acquiring Sharadar is also a reasonable signal.

For Risk 2, the mitigation is the engineering practices described above, plus the DuckDB audit database: every ticker, date, and value is queryable, with a JSON log showing exactly how each number was calculated.

Risk 3: Overfitting

My configuration file for backtesting looks like this:

[backtest]
# Year range (inclusive) for annual rebalancing.
start_year = 2010
end_year = 2025

# Rebalance date in each year.
rebalance_month = 7
rebalance_day = 1

# Hold top N names by combined rank.
top_n = 20

# Amount invested at the start of each year (same currency as your prices).
annual_contribution = 20_000

# One-way transaction fee applied at each rebalance leg (sell + buy = 2x).
# 0.001 = 0.1% per leg, typical for a discount broker.
# Set to 0 to disable.
transaction_fee_pct = 0.001

# Annual risk-free rate used for Sharpe ratio calculation.
# Use the average short-term T-bill rate for your backtest period.
# 0.04 = 4% (reasonable for 2000-2024 average), set to 0 to get raw return/vol.
risk_free_rate = 0.04

# Size of annual universe before Magic Formula ranking (approx Russell 1000).
universe_n = 2000

# Skip the top-N largest companies before sampling universe_n.
# e.g. universe_offset = 1000 → exclude the 1 000 biggest (mega caps),
# then take the next 3 000 (ranks 1 001–4 000 by market cap).
# Set to 0 to include all sizes from the top.
#universe_offset = 200
universe_offset = 0

# Benchmark ticker used for yearly comparison.
#benchmark_ticker = "IWB"
benchmark_ticker = "IWM"

# Backtest bronze controls (same meaning as pipeline).
skip_bronze = false
force = false

# Write signal audit rows to audit.duckdb after each backtest year.
# Disable for faster iteration when you don't need to query the audit DB.
# Old audit rows are always cleared at the start of a run regardless of this setting.
write_audit = false

[backtest.screens]
# Sortino filter — require trailing-12-month Sortino ratio >= threshold
# 0.0 means disabled. Typical enabled value: 0.4
sortino_min = 0.2
sortino_window_days = 252

# Momentum filter — require price above SMA and/or positive 12-1 momentum
# 0 means disabled for each leg independently
momentum_sma_days = 0
momentum_lookback_months = 0

# Piotroski-style quality filter — require quality score >= threshold (0-5 scale).
# Criteria use only the SF1 columns available in the Sharadar extract:
#   Q1 EBIT > 0  Q2 EBIT improved YoY  Q3 debt decreased  Q4 current ratio improved  Q5 no dilution
# 0 means disabled. Typical enabled value: 3 (require 3 out of 5).
piotroski_min = 3

# Composite value filter — keep only tickers in the cheapest top_pct% by a multi-ratio value score.
# Three signals are percentile-ranked and averaged (Gray / Alpha Architect approach):
#   V1 EBIT/EV   (earnings yield — Magic Formula's own metric)
#   V2 EBIT/P    (P/E proxy using EBIT)
#   V3 BV/P      (book-to-price proxy: (current_assets - current_liabilities + net_PP&E) / mktcap)
# 0 means disabled. Typical enabled value: 50 (keep the cheaper half of the ranked universe).
composite_value_top_pct = 0

Tuning those parameters until the backtest looks good is parameter hacking. To catch overfitting, I ran the full backtest on different rebalance dates. If a parameter set is truly overfit, good results on one date will collapse on another.

composite_value_top_pct and momentum_lookback_months are the two I learned to disable. They improved results on one specific rebalance date but fell apart when the date changed. The momentum filter was effectively turning the strategy into a momentum strategy, not a Magic Formula strategy. That’s scope creep, not an improvement.

4. Performance Engineering: Vectorisation and Parallelism

Parallel computing and processing speed — Photo by Kashish Lamba on Unsplash

Why this matters: naively screening 2000 tickers with a row-by-row Python loop takes minutes; the vectorised + parallel version takes seconds.

Medallion Architecture separates network-heavy from compute-heavy operations

It separates slow network operations (downloading stock data) from offline computation. The goal is fast iteration on the backtest, since you tweak it constantly.

Vectorisation beats for-loop

Matrix multiplication is much faster than row-by-row loops because it can be parallelized, unlike for-loop operations on dataframes.

Parallel everything you can

Some computations can run in parallel: matrix operations, and preparing the Bronze layer (with rate-limit protection and exponential backoff).

Parallelism also helped with Optuna, which searches for optimal weights for the Sortino and Piotroski filters.

One caveat: parallel threads share the same Bronze/Silver/Gold folders. Make sure threads do not overwrite each other’s output.

5. Define experiment

Backtesting with trading screens — Photo by Jakub Żerdzicki on Unsplash

Setup

Before running an experiment, define it.

We add \$20,000 per year on the rebalance date, sell all positions, and buy new ones based on the updated ranking (top 20 stocks).

Transaction fee: 0.1% per leg. Risk-free rate: 4%. Backtest period: 2010-2025. The period covers COVID, is recent enough to be relevant, and includes about 5 years of data after the method became widely known.

The benchmark is the whole universe we are investing in: Russell 2000 (IWM). We also compare against IWB (Russell 1000, large caps) as a broader market reference, a harder bar since large caps are more efficiently priced.

Metrics

CAGR (Compound Annual Growth Rate): the constant annual rate that would produce the same end result. It normalizes growth over time.
MWR (Money-Weighted Return): weights returns by how much capital was deployed at each point in time. Large deposits before good periods amplify returns; large deposits before bad periods drag them down.
Example for CAGR and MWR:
- Start: £100, drops 50% → £50. You add £100 → £150. Then rises 100% → £300.
- CAGR: measures strategy performance. What was the annual growth rate, independent of your deposits?
- MWR: in practice, it matters whether the strategy performs better when your capital is small versus when it’s large. MWR captures this. It’s higher when the strategy performs well while more capital is deployed.
Sharpe: profit per unit of risk. Higher is better. We want at least as much return per unit of risk as the benchmark.
Min Survivors: some filter configurations remove too many tickers. This shows the lowest count that remained in any single year.
Max Drawdown: the largest loss from peak to trough.
Win Rate: the percentage of years the portfolio beat the benchmark.
Final Profit: what we actually earned after transaction fees, minus total capital invested.

Variants

The vanilla Magic Formula is getting old. A few improvements have been published over the years that I wanted to optionally add, at least in a limited form. I turned them on lightly, just enough to filter out the most obvious value traps.

Improvements tested:

Sortino: return per unit of downside volatility. Better than Sharpe because it only penalizes losses, not gains.
Piotroski F-score: scores companies on financial health. I use a small subset of it to filter out the clearly bad ones.
Momentum: did not help, disabled.
Composite Value Filter: same result, disabled.

6. Audit Trail: Verifying Every Single Calculation

Financial data on screen — Photo by Daniel Brzdęk on Unsplash

DuckDB is a Python library that gives you an analytical database. Think SQLite, but built for analytical queries: aggregations, window functions, large scans. No server needed; it runs in-process.

Adding DuckDB changed everything. Auditing every calculation helped me understand the model and cross-check results against alternative data sources. Every pipeline run writes to audit.duckdb, a full ledger with an explanation for every number.

When I see a suspicious value in the report, I can trace its full lineage: what formula produced it, what inputs went in. I can also feed the audit output to an LLM and ask if anything looks off. This caught dozens of bugs, particularly differences between yfinance and Sharadar data.

Example: FLR audit trail:

Every value is traceable. For example: earnings_yield for FLR on 2012-07-01, built from ebit / ev, where silver.ebit=916321000.0 and gold.ev=5878382636.0.

7. Automatic Parameter Optimisation: Optuna

Optimization and hyperparameter search — Photo by Logan Voss on Unsplash

Optuna is an open-source Python hyperparameter optimization framework. I used it to find the best values for the Sortino threshold and Piotroski minimum score. It’s much faster than brute force because it uses Bayesian search (TPE sampler) rather than testing every combination.

To guard against overfitting, I ran the strategy with the tuned parameters on different rebalance dates. First manually, checking each result. Then automatically across all 12 months (1st day of each month), generating a probability distribution.

_adv = advantage. Positive means the Magic Formula portfolio beat the IWB benchmark.

  [ 1/12]  Jan-01 ...  cagr_adv=-2.1%  mwr_adv=-4.3%  win_rate=38%
  [ 2/12]  Feb-01 ...  cagr_adv=-0.8%  mwr_adv=-1.6%  win_rate=25%
  [ 3/12]  Mar-01 ...  cagr_adv=-2.5%  mwr_adv=-3.6%  win_rate=38%
  [ 4/12]  Apr-01 ...  cagr_adv=+1.2%  mwr_adv=+0.8%  win_rate=50%
  [ 5/12]  May-01 ...  cagr_adv=-1.3%  mwr_adv=-1.7%  win_rate=44%
  [ 6/12]  Jun-01 ...  cagr_adv=+1.0%  mwr_adv=+1.2%  win_rate=50%
  [ 7/12]  Jul-01 ...  cagr_adv=+1.5%  mwr_adv=+1.9%  win_rate=50%
  [ 8/12]  Aug-01 ...  cagr_adv=+0.7%  mwr_adv=+0.6%  win_rate=44%
  [ 9/12]  Sep-01 ...  cagr_adv=-1.4%  mwr_adv=-2.6%  win_rate=56%
  [10/12]  Oct-01 ...  cagr_adv=-4.7%  mwr_adv=-5.1%  win_rate=38%
  [11/12]  Nov-01 ...  cagr_adv=-4.9%  mwr_adv=-5.8%  win_rate=31%
  [12/12]  Dec-01 ...  cagr_adv=-2.3%  mwr_adv=-4.0%  win_rate=44%

This particular run still had momentum enabled. The sweep revealed it was the culprit: momentum parameters that looked great on July rebalance collapsed on most other dates. That sensitivity to entry timing is the classic overfitting signal.

Once momentum was disabled and only Sortino and Piotroski were kept, the results stabilised. April, June, July, and August show the strongest months — consistent with known seasonal effects around earnings season. The final configuration used in Appendix 2 and 3 is July rebalance, which sits in that better-performing window.

8. Summary

Beating the benchmark is hard.
Beating it on historical data says nothing about the future. But it does show whether the method ever had an edge.
Backtesting is hard. It involves a lot of assumptions and a lot of data. But once built properly, the framework can be reused for other strategies.
Know what kind of investor you are. If my method slightly underperforms the benchmark but is more volatile, I might still prefer it. Consistently betting on value stocks in a volatile market is the kind of bet I want for my riskier portfolio.
Benchmarks are powerful. Given that, it makes sense to hold multiple portfolios: one that simply tracks the benchmark, avoids all the risks mentioned here, and bets on a long horizon with no behavioral interference. And a separate portfolio for systematic stock picking, which works for me. It satisfies my need to do the math. It also solves the behavioral problem by delegating decisions to an algorithm, which means I’m less likely to sabotage my other portfolios where I use different strategies.
Auditability is non-negotiable. A final ranking means nothing if you can’t trace how each number was calculated and spot-check a sample.
Vanilla Magic Formula did not outperform Russell 1000 over the 16-year backtest. The method has been widely published; the edge is gone. With Sortino and Piotroski filters, the edge comes back: +4.8% CAGR over IWM (same universe) and +1.5% over IWB (large caps).
The clearest lesson: regular investing works. Whether you use Magic Formula or just buy the benchmark, contributing \$320,000 over 16 years turned into over \$1M — vs \$794k for IWM and \$1.17M for IWB.

Appendix 1. Magic Formula Backtesting config


start_year	2010
end_year	2025
rebalance_month	7
rebalance_day	1
top_n	20
benchmark_ticker	IWM
— Universe —
universe_n	2000
universe_offset	0
— Filters —
min_market_cap	50000000
max_market_cap	0
excluded_sectors	Financials, Utilities, Real Estate, Financial Services
excluded_industries	Insurance, Managed Care, Healthcare Plans, Insurance—Life, Insurance—Property & Casualty, Insurance—Specialty, Insurance—Diversified, Insurance Brokers
— Contributions & Fees —
annual_contribution	20000
transaction_fee_pct	0.001
risk_free_rate	0.04
— Overlays —
screens.sortino_min	0.2
screens.sortino_window_days	252
screens.momentum_sma_days	0
screens.momentum_lookback_months	0
screens.piotroski_min	3
screens.composite_value_top_pct	0

Appendix 2. Magic Formula backtesting 2010 - 2025

IWB = Russell 1000 (large caps). IWM = Russell 2000 (small caps, same universe as portfolio).

year	as_of_date	n_holdings	n_survivors	portfolio_return	IWB return	IWM return	annual_contribution	total_contributed	portfolio_value	IWB value	IWM value	portfolio_profit	IWB profit	IWM profit
2010	2010-07-01	20	804	44.8%	34.0%	40.3%	20,000	20,000	28,933	26,800	28,064	8,933	6,800	8,064
2011	2011-07-01	20	925	-6.7%	3.0%	-2.4%	20,000	40,000	45,551	48,202	46,911	5,551	8,202	6,911
2012	2012-07-01	20	472	26.5%	21.6%	24.4%	20,000	60,000	82,780	82,943	83,224	22,780	22,943	23,224
2013	2013-07-01	20	776	23.8%	24.7%	22.9%	20,000	80,000	127,029	128,340	126,882	47,029	48,340	46,882
2014	2014-07-01	20	790	12.9%	7.7%	6.2%	20,000	100,000	165,666	159,825	155,923	65,666	59,825	55,923
2015	2015-07-01	20	536	-1.6%	2.3%	-6.5%	20,000	120,000	182,380	183,937	164,474	62,380	63,937	44,474
2016	2016-07-01	20	449	19.3%	18.0%	25.0%	20,000	140,000	241,050	240,546	230,665	101,050	100,546	90,665
2017	2017-07-01	20	681	8.5%	14.4%	17.6%	20,000	160,000	282,556	298,073	294,664	122,556	138,073	134,664
2018	2018-07-01	20	629	6.6%	10.4%	-3.8%	20,000	180,000	321,818	351,227	302,637	141,818	171,227	122,637
2019	2019-07-01	20	465	-7.5%	7.1%	-7.8%	20,000	200,000	315,461	397,688	297,347	115,461	197,688	97,347
2020	2020-07-01	20	360	55.6%	42.7%	64.9%	20,000	220,000	521,027	596,022	523,182	301,027	376,022	303,182
2021	2021-07-01	20	700	-13.8%	-12.6%	-25.1%	20,000	240,000	465,234	538,518	406,727	225,234	298,518	166,727
2022	2022-07-01	20	286	37.0%	18.0%	11.3%	20,000	260,000	663,441	658,876	474,789	403,441	398,876	214,789
2023	2023-07-01	20	713	23.4%	23.9%	8.7%	20,000	280,000	841,503	841,016	537,852	561,503	561,016	257,852
2024	2024-07-01	20	518	21.9%	15.2%	9.6%	20,000	300,000	1,048,224	991,602	611,433	748,224	691,602	311,433
2025	2025-07-01	20	524	31.4%	15.2%	25.8%	20,000	320,000	1,401,079	1,165,683	794,343	1,081,079	845,683	474,343
Average			601.75	17.6%	15.3%	13.2%

Appendix 3. Magic Formula backtesting metrics

	Portfolio	vs IWB (Russell 1000)	vs IWM (Russell 2000)
Annual Contribution	20,000
Transaction Fee (per leg)	0.1%
Risk-Free Rate (annual)	4.0%
Total Years	16
Total Contributed	320,000

CAGR (time-weighted)	16.1%	14.6%	11.3%
CAGR Advantage		+1.5%	+4.8%

MWR / IRR (money-weighted)	15.9%	14.0%	10.0%
MWR Advantage		+1.9%	+5.9%

Sharpe (annual)	0.697	0.868	0.429
Sharpe Advantage		-0.171	+0.268

Max Drawdown	-13.8%	-12.6%	-25.1%

Win Rate	—	50% (8/16)	75% (12/16)

Final Value	1,401,079	1,165,683	794,343
Profit	1,081,079	845,683	474,343

Avg Survivors / Year	601.75
Min Survivors / Year	286

Positive CAGR and MWR advantage against both benchmarks. Against IWM (same universe): +4.8% CAGR, 75% win rate. Against IWB (large caps): +1.5% CAGR, 50% win rate.
Sharpe is better than IWM (+0.268) but worse than IWB (-0.171). The portfolio takes on more volatility than large caps but less than small caps.
Max Drawdown: much better than IWM (-13.8% vs -25.1%). The filters removed most of the worst small-cap crashes.
Win rate 75% vs IWM. The strategy consistently picks better small caps than the index.
Portfolio profit: \$1,081,079 vs \$845,683 (IWB) and \$474,343 (IWM). That’s \$235,396 more than large caps and \$606,736 more than the small-cap index over 16 years.

References

Sources:

Magic formula investing - Wikipedia
The Little Book That Still Beats the Market by Joel Greenblatt
My personal backtesting and ETL pipeline framework (contact me at [email protected] if you’re interested in working together)