Sentiment Analysis in the Turkish Stock Market (BIST): Generating Signals from Financial News with Qwen and Llama
Building a time-aware, Turkish-native NLP pipeline using Qwen and Llama for financial news signal extraction on Borsa Istanbul.
Sentiment analysis in equities is easy to oversell and hard to do well. This paper presents a practical framework for building a time-aware, Turkish-native financial NLP pipeline using open-weight LLMs (Qwen and Llama) for signal extraction on Borsa Istanbul. We address label design, forward return targets, local inference prototyping, and the critical pitfalls of leakage, non-stationarity, and Turkish morphology in financial text.
Key Takeaways
- Useful sentiment treats the signal as a conditional forecast variable aligned to event timestamps and trading horizons -- not a binary positive/negative label.
- Open-weight models (Qwen3, Llama 4) make local, Turkish-language financial NLP stacks increasingly practical without external API dependencies.
- Label design is the central problem: forward return targets must be time-aligned to avoid leakage contamination in backtests.
- Off-the-shelf English sentiment models misread Turkish financial context, especially KAP-style disclosures -- Turkish-native pipelines are essential.
- The alpha comes from labeling discipline, entity resolution, and proper out-of-sample testing, not merely from using an LLM.
Sentiment as a Forecast Variable
Sentiment analysis in equities is easy to oversell and hard to do well. The useful version treats sentiment as a conditional forecast variable: a noisy signal that may explain cross-sectional returns, volatility, or volume once aligned to the correct event timestamp and trading horizon.
In the Turkish market, open-weight LLM ecosystems have matured significantly. Qwen has publicly released Qwen3-family weights, and Meta promotes Llama 4-family models and Llama Stack distributions for self-hosted workflows. That makes a local, Turkish-language financial NLP stack increasingly practical.
The central problem is label design. For a news item arriving at time \(t\), a common target is the forward return over horizon \(h\):
A practical sentiment score can be defined as:
Local Inference Prototype
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "Qwen/Qwen3-8B" # replace with local checkpoint tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3) texts = [ "Şirket, beklentilerin üzerinde net kar açıkladı ve yeni yatırım planı duyurdu.", "Faiz kararı sonrası banka hisselerinde satış baskısı artıyor." ] inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") with torch.no_grad(): logits = model(**inputs).logits probs = torch.softmax(logits, dim=-1) print(probs) # [bearish, neutral, bullish]
Critical Pitfalls
- Leakage: if you label a news article with end-of-day return even though it arrived after the close, your backtest is contaminated
- Non-stationarity: macro regime changes, regulation, inflation cycles, and sector narratives can all alter the meaning of the same words over time
- Turkish morphology: off-the-shelf English finance sentiment models often misread Turkish context, especially in KAP-style disclosures
The edge comes not from "using an LLM," but from building a time-aware, Turkish-native, market-aligned inference pipeline. The alpha comes from labeling discipline, entity resolution, and proper out-of-sample testing.
Related Research
- Market Microstructure: Bid-Ask Spread Dynamics — Decomposing the cost of immediacy for execution models
- Sovereign AI: Local LLMs for Quant Research — Why self-hosted models are structurally superior for investment research
- Automating Alpha Discovery with Genetic Algorithms — Evolutionary search for automated signal generation
- All Research Papers — Full paper collection on QuantMedia