Paper 10 Alt Data Machine Learning Nowcasting Satellite

The Role of Alternate Data in Quantitative Finance

Traditional market data tells you what prices did. Alternate data aims to tell you why they might do something next. Satellite imagery, transaction panels, and e-commerce data as nowcasting signals with rigorous data governance.

QuantMedia Research · February 25, 2026 · Alternative Data

Abstract

This paper examines the role of alternative (non-traditional) data sources in quantitative finance, including satellite imagery, transaction panels, and e-commerce pricing data. We present a nowcasting framework, a machine learning pipeline for alternative data features, and a data governance checklist addressing timestamp integrity, survivorship bias, and legal rights. The core thesis is that the real edge lies in the translation layer that converts messy external traces into clean, point-in-time, economically interpretable variables.

Key Takeaways

Alternative data provides signals that are earlier, orthogonal, or structured in ways the market has not fully absorbed.
The nowcasting model requires that features are truly available at time t, properly normalized, and economically linked to the target.
Satellite imagery, transaction panels, and e-commerce pricing each offer distinct nowcasting advantages for different sectors.
Data governance -- timestamp integrity, retroactive revisions, survivorship bias, and legal rights -- is as critical as modeling.
The best alt-data teams spend as much time on ontology, joins, and missing-data behavior as on the models themselves.

Introduction

Traditional market data tells you what prices did. Alternate data aims to tell you why they might do something next. If public prices already summarize common information, a quant edge must often come from signals that are earlier, orthogonal, or simply structured in a way that the market has not fully absorbed.

A useful framing: for a news item or data point arriving at time \(t\), the research task is:

Nowcasting Model $$y_{t+1} = \beta^\top x_t + \epsilon_{t+1}$$

The challenge is not writing that equation — it is making sure \(x_t\) is truly available at time \(t\), properly normalized, and economically linked to the target. Alternate data examples include:

Satellite imagery: parking-lot traffic, shipping flows, refinery utilization, agricultural conditions
Transaction panels: nowcast consumer spend trends before official earnings releases
E-commerce price data: proxy for inflation pressure or competitive intensity

Machine Learning Pipeline

alt_data_model.py Python

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import r2_score

df = pd.DataFrame({
    "web_price_change":    [0.01, -0.02,  0.00,  0.03,  0.01, -0.01],
    "traffic_index":       [102,  98,   101,   110,   108,   99  ],
    "transaction_growth":  [0.05,  0.01,  0.02,  0.06,  0.04,  0.00],
    "target":              [0.02, -0.01,  0.00,  0.03,  0.01, -0.02]
})

X = df.drop(columns=["target"])
y = df["target"]

tscv   = TimeSeriesSplit(n_splits=3)
scores = []

for train_idx, test_idx in tscv.split(X):
    model = RandomForestRegressor(n_estimators=200, random_state=42)
    model.fit(X.iloc[train_idx], y.iloc[train_idx])
    pred = model.predict(X.iloc[test_idx])
    scores.append(r2_score(y.iloc[test_idx], pred))

print("Mean OOS R²:", sum(scores) / len(scores))

Data Governance Checklist

Is the timestamp point-in-time correct?
Is the vendor revising history retroactively?
Are there survivorship or coverage biases in the dataset?
Does the dataset include entities that later disappeared?
What rights do we actually have to use and store the data?

Conclusion

The best alternate-data teams behave less like headline chasers and more like measurement scientists. They spend as much time on ontology, joins, timestamp integrity, and missing-data behavior as they do on modeling. The real edge comes from converting messy external traces into clean, point-in-time, economically interpretable variables. That translation layer is where most of the alpha lives.

⚡ Daily Stock Signals Dashboard

500+ US stocks scanned daily after market close. Free BUY signals with backtest context.

View Signals →

Get Research Updates

Daily pre-open briefing with market signals, research highlights, and quantitative analysis. Free, no spam.

No spam. Unsubscribe anytime. Privacy Policy

The Role of Alternate Data in Quantitative Finance

Abstract

Key Takeaways

Introduction

Machine Learning Pipeline

Data Governance Checklist

Conclusion

Related Research

Slippage and Latency Modeling

HF Analytical Operations

All Papers