Paper 04 Statistics PSR Backtest

Probabilistic Sharpe Ratio (PSR) and Backtest Overfitting

A statistically rigorous alternative to raw Sharpe ratio that adjusts for non-normality, sample length, skewness, and kurtosis to detect backtest overfitting.

QuantMedia Research · January 25, 2026 · Statistics

Abstract

This paper examines the Probabilistic Sharpe Ratio (PSR), a statistically rigorous extension of the classical Sharpe ratio that accounts for sample length, skewness, and kurtosis. PSR estimates the probability that an observed Sharpe ratio exceeds a benchmark, providing an inferential framework that penalizes short histories and non-normal return distributions. We present the mathematical formulation, a Python implementation, and discuss its application in detecting backtest overfitting.

Key Takeaways

Raw Sharpe ratio says nothing about statistical confidence, non-normality, or multiple testing -- PSR addresses all three.
PSR inflates uncertainty when returns are asymmetric or fat-tailed, penalizing strategies that benefit from favorable sampling noise.
Ranking strategies by PSR instead of raw Sharpe forces a stronger evidentiary standard: short histories and ugly tail behavior are penalized.
Standard Sharpe is descriptive; PSR is inferential -- it lets you test whether estimated skill exceeds a benchmark such as SR* = 1.

Introduction

The Sharpe ratio is one of the most abused statistics in quantitative finance. Two strategies can have the same Sharpe ratio even if one is estimated from a short, skewed, fat-tailed sample and the other from a long, well-behaved history. A raw Sharpe number says nothing about statistical confidence, non-normality, or multiple testing.

Standard Sharpe Ratio $$\widehat{SR} = \frac{\hat{\mu}}{\hat{\sigma}}$$

The Probabilistic Sharpe Ratio (PSR) estimates the probability that an observed Sharpe ratio exceeds a benchmark \(SR^*\), while adjusting for skewness and kurtosis:

Probabilistic Sharpe Ratio $$PSR(SR^*) = \Phi\left( \frac{(\widehat{SR} - SR^*)\sqrt{T-1}} {\sqrt{1 - \gamma_3 \widehat{SR} + \frac{\gamma_4 - 1}{4}\widehat{SR}^2}} \right)$$

Here, \(T\) is the sample length, \(\gamma_3\) is skewness, \(\gamma_4\) is kurtosis, and \(\Phi\) is the standard normal CDF. The denominator inflates uncertainty when returns are asymmetric or fat-tailed — exactly what classical Sharpe ignores.

Python Implementation

psr.py Python

import numpy as np
import pandas as pd
from scipy.stats import skew, kurtosis, norm

def probabilistic_sharpe_ratio(returns, sr_benchmark=0.0, periods_per_year=252):
    r = pd.Series(returns).dropna()
    sr_hat = np.sqrt(periods_per_year) * r.mean() / r.std(ddof=1)

    T  = len(r)
    g3 = skew(r, bias=False)
    g4 = kurtosis(r, fisher=False, bias=False)  # Pearson kurtosis

    numerator   = (sr_hat - sr_benchmark) * np.sqrt(T - 1)
    denominator = np.sqrt(1 - g3 * sr_hat + ((g4 - 1) / 4.0) * sr_hat**2)
    z = numerator / denominator

    return {
        "sharpe":   sr_hat,
        "psr":      norm.cdf(z),
        "skew":     g3,
        "kurtosis": g4,
        "z_score":  z
    }

# Example
np.random.seed(42)
rets = np.random.normal(0.0005, 0.01, 500)
print(probabilistic_sharpe_ratio(rets, sr_benchmark=1.0))

Ranking strategies by PSR instead of raw Sharpe forces the strategy to "earn" its Sharpe under a stronger evidentiary standard. It penalizes short histories, punishes ugly tail behavior, and gives you a way to compare estimated skill against a benchmark such as \(SR^* = 1\). Standard Sharpe is descriptive. PSR is inferential.

⚡ Daily Stock Signals Dashboard

500+ US stocks scanned daily after market close. Free BUY signals with backtest context.

View Signals →

Get Research Updates

Daily pre-open briefing with market signals, research highlights, and quantitative analysis. Free, no spam.

No spam. Unsubscribe anytime. Privacy Policy

Probabilistic Sharpe Ratio (PSR) and Backtest Overfitting

Key Takeaways

Introduction

Python Implementation

Related Research

Hierarchical Risk Parity (HRP)

VPIN and Order Flow Toxicity

Low-Latency Trading Stack: RTX 5090 & Core Ultra 9

Quantum Signals — Live BUY Signal Dashboard

All Research Papers