A statistically rigorous alternative to raw Sharpe ratio that adjusts for non-normality, sample length, skewness, and kurtosis to detect backtest overfitting.
This paper examines the Probabilistic Sharpe Ratio (PSR), a statistically rigorous extension of the classical Sharpe ratio that accounts for sample length, skewness, and kurtosis. PSR estimates the probability that an observed Sharpe ratio exceeds a benchmark, providing an inferential framework that penalizes short histories and non-normal return distributions. We present the mathematical formulation, a Python implementation, and discuss its application in detecting backtest overfitting.
The Sharpe ratio is one of the most abused statistics in quantitative finance. Two strategies can have the same Sharpe ratio even if one is estimated from a short, skewed, fat-tailed sample and the other from a long, well-behaved history. A raw Sharpe number says nothing about statistical confidence, non-normality, or multiple testing.
The Probabilistic Sharpe Ratio (PSR) estimates the probability that an observed Sharpe ratio exceeds a benchmark \(SR^*\), while adjusting for skewness and kurtosis:
Here, \(T\) is the sample length, \(\gamma_3\) is skewness, \(\gamma_4\) is kurtosis, and \(\Phi\) is the standard normal CDF. The denominator inflates uncertainty when returns are asymmetric or fat-tailed — exactly what classical Sharpe ignores.
import numpy as np import pandas as pd from scipy.stats import skew, kurtosis, norm def probabilistic_sharpe_ratio(returns, sr_benchmark=0.0, periods_per_year=252): r = pd.Series(returns).dropna() sr_hat = np.sqrt(periods_per_year) * r.mean() / r.std(ddof=1) T = len(r) g3 = skew(r, bias=False) g4 = kurtosis(r, fisher=False, bias=False) # Pearson kurtosis numerator = (sr_hat - sr_benchmark) * np.sqrt(T - 1) denominator = np.sqrt(1 - g3 * sr_hat + ((g4 - 1) / 4.0) * sr_hat**2) z = numerator / denominator return { "sharpe": sr_hat, "psr": norm.cdf(z), "skew": g3, "kurtosis": g4, "z_score": z } # Example np.random.seed(42) rets = np.random.normal(0.0005, 0.01, 500) print(probabilistic_sharpe_ratio(rets, sr_benchmark=1.0))
Ranking strategies by PSR instead of raw Sharpe forces the strategy to "earn" its Sharpe under a stronger evidentiary standard. It penalizes short histories, punishes ugly tail behavior, and gives you a way to compare estimated skill against a benchmark such as \(SR^* = 1\). Standard Sharpe is descriptive. PSR is inferential.