Paper 04 Statistics PSR Backtest

Probabilistic Sharpe Ratio (PSR) and Backtest Overfitting

A statistically rigorous alternative to raw Sharpe ratio that adjusts for non-normality, sample length, skewness, and kurtosis to detect backtest overfitting.

Abstract

This paper examines the Probabilistic Sharpe Ratio (PSR), a statistically rigorous extension of the classical Sharpe ratio that accounts for sample length, skewness, and kurtosis. PSR estimates the probability that an observed Sharpe ratio exceeds a benchmark, providing an inferential framework that penalizes short histories and non-normal return distributions. We present the mathematical formulation, a Python implementation, and discuss its application in detecting backtest overfitting.

Key Takeaways

  • Raw Sharpe ratio says nothing about statistical confidence, non-normality, or multiple testing -- PSR addresses all three.
  • PSR inflates uncertainty when returns are asymmetric or fat-tailed, penalizing strategies that benefit from favorable sampling noise.
  • Ranking strategies by PSR instead of raw Sharpe forces a stronger evidentiary standard: short histories and ugly tail behavior are penalized.
  • Standard Sharpe is descriptive; PSR is inferential -- it lets you test whether estimated skill exceeds a benchmark such as SR* = 1.

Introduction

The Sharpe ratio is one of the most abused statistics in quantitative finance. Two strategies can have the same Sharpe ratio even if one is estimated from a short, skewed, fat-tailed sample and the other from a long, well-behaved history. A raw Sharpe number says nothing about statistical confidence, non-normality, or multiple testing.

Standard Sharpe Ratio
$$\widehat{SR} = \frac{\hat{\mu}}{\hat{\sigma}}$$

The Probabilistic Sharpe Ratio (PSR) estimates the probability that an observed Sharpe ratio exceeds a benchmark \(SR^*\), while adjusting for skewness and kurtosis:

Probabilistic Sharpe Ratio
$$PSR(SR^*) = \Phi\left( \frac{(\widehat{SR} - SR^*)\sqrt{T-1}} {\sqrt{1 - \gamma_3 \widehat{SR} + \frac{\gamma_4 - 1}{4}\widehat{SR}^2}} \right)$$

Here, \(T\) is the sample length, \(\gamma_3\) is skewness, \(\gamma_4\) is kurtosis, and \(\Phi\) is the standard normal CDF. The denominator inflates uncertainty when returns are asymmetric or fat-tailed — exactly what classical Sharpe ignores.

Python Implementation

psr.py Python
import numpy as np
import pandas as pd
from scipy.stats import skew, kurtosis, norm

def probabilistic_sharpe_ratio(returns, sr_benchmark=0.0, periods_per_year=252):
    r = pd.Series(returns).dropna()
    sr_hat = np.sqrt(periods_per_year) * r.mean() / r.std(ddof=1)

    T  = len(r)
    g3 = skew(r, bias=False)
    g4 = kurtosis(r, fisher=False, bias=False)  # Pearson kurtosis

    numerator   = (sr_hat - sr_benchmark) * np.sqrt(T - 1)
    denominator = np.sqrt(1 - g3 * sr_hat + ((g4 - 1) / 4.0) * sr_hat**2)
    z = numerator / denominator

    return {
        "sharpe":   sr_hat,
        "psr":      norm.cdf(z),
        "skew":     g3,
        "kurtosis": g4,
        "z_score":  z
    }

# Example
np.random.seed(42)
rets = np.random.normal(0.0005, 0.01, 500)
print(probabilistic_sharpe_ratio(rets, sr_benchmark=1.0))

Ranking strategies by PSR instead of raw Sharpe forces the strategy to "earn" its Sharpe under a stronger evidentiary standard. It penalizes short histories, punishes ugly tail behavior, and gives you a way to compare estimated skill against a benchmark such as \(SR^* = 1\). Standard Sharpe is descriptive. PSR is inferential.

⚡ Daily Stock Signals Dashboard
500+ US stocks scanned daily after market close. Free BUY signals with backtest context.
View Signals →
Get Research Updates
Daily pre-open briefing with market signals, research highlights, and quantitative analysis. Free, no spam.
No spam. Unsubscribe anytime. Privacy Policy