Paper 06 NLP / Frontier Research LLM BIST Qwen Llama

Sentiment Analysis in the Turkish Stock Market (BIST): Generating Signals from Financial News with Qwen and Llama

Building a time-aware, Turkish-native NLP pipeline using Qwen and Llama for financial news signal extraction on Borsa Istanbul.

QuantMedia Research · March 1, 2026 · NLP / Frontier Research

Abstract

Sentiment analysis in equities is easy to oversell and hard to do well. This paper presents a practical framework for building a time-aware, Turkish-native financial NLP pipeline using open-weight LLMs (Qwen and Llama) for signal extraction on Borsa Istanbul. We address label design, forward return targets, local inference prototyping, and the critical pitfalls of leakage, non-stationarity, and Turkish morphology in financial text.

Key Takeaways

Useful sentiment treats the signal as a conditional forecast variable aligned to event timestamps and trading horizons -- not a binary positive/negative label.
Open-weight models (Qwen3, Llama 4) make local, Turkish-language financial NLP stacks increasingly practical without external API dependencies.
Label design is the central problem: forward return targets must be time-aligned to avoid leakage contamination in backtests.
Off-the-shelf English sentiment models misread Turkish financial context, especially KAP-style disclosures -- Turkish-native pipelines are essential.
The alpha comes from labeling discipline, entity resolution, and proper out-of-sample testing, not merely from using an LLM.

Sentiment as a Forecast Variable

Sentiment analysis in equities is easy to oversell and hard to do well. The useful version treats sentiment as a conditional forecast variable: a noisy signal that may explain cross-sectional returns, volatility, or volume once aligned to the correct event timestamp and trading horizon.

In the Turkish market, open-weight LLM ecosystems have matured significantly. Qwen has publicly released Qwen3-family weights, and Meta promotes Llama 4-family models and Llama Stack distributions for self-hosted workflows. That makes a local, Turkish-language financial NLP stack increasingly practical.

The central problem is label design. For a news item arriving at time \(t\), a common target is the forward return over horizon \(h\):

Forward Return Target $$r_{t,h} = \ln\left(\frac{P_{t+h}}{P_t}\right)$$

A practical sentiment score can be defined as:

Sentiment Score $$s_t = p(\text{bullish} \mid x_t) - p(\text{bearish} \mid x_t)$$

Local Inference Prototype

bist_sentiment.py Python

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Qwen/Qwen3-8B"  # replace with local checkpoint
tokenizer  = AutoTokenizer.from_pretrained(model_name)
model      = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

texts = [
    "Şirket, beklentilerin üzerinde net kar açıkladı ve yeni yatırım planı duyurdu.",
    "Faiz kararı sonrası banka hisselerinde satış baskısı artıyor."
]

inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    probs  = torch.softmax(logits, dim=-1)

print(probs)  # [bearish, neutral, bullish]

Critical Pitfalls

Leakage: if you label a news article with end-of-day return even though it arrived after the close, your backtest is contaminated
Non-stationarity: macro regime changes, regulation, inflation cycles, and sector narratives can all alter the meaning of the same words over time
Turkish morphology: off-the-shelf English finance sentiment models often misread Turkish context, especially in KAP-style disclosures

The edge comes not from "using an LLM," but from building a time-aware, Turkish-native, market-aligned inference pipeline. The alpha comes from labeling discipline, entity resolution, and proper out-of-sample testing.

Market Microstructure: Bid-Ask Spread Dynamics — Decomposing the cost of immediacy for execution models
Sovereign AI: Local LLMs for Quant Research — Why self-hosted models are structurally superior for investment research
Automating Alpha Discovery with Genetic Algorithms — Evolutionary search for automated signal generation
All Research Papers — Full paper collection on QuantMedia

⚡ Daily Stock Signals Dashboard

500+ US stocks scanned daily after market close. Free BUY signals with backtest context.

View Signals →

Get Research Updates

Daily pre-open briefing with market signals, research highlights, and quantitative analysis. Free, no spam.

No spam. Unsubscribe anytime. Privacy Policy