Paper 06 NLP / Frontier Research LLM BIST Qwen Llama

Sentiment Analysis in the Turkish Stock Market (BIST): Generating Signals from Financial News with Qwen and Llama

Building a time-aware, Turkish-native NLP pipeline using Qwen and Llama for financial news signal extraction on Borsa Istanbul.

Abstract

Sentiment analysis in equities is easy to oversell and hard to do well. This paper presents a practical framework for building a time-aware, Turkish-native financial NLP pipeline using open-weight LLMs (Qwen and Llama) for signal extraction on Borsa Istanbul. We address label design, forward return targets, local inference prototyping, and the critical pitfalls of leakage, non-stationarity, and Turkish morphology in financial text.

Key Takeaways

  • Useful sentiment treats the signal as a conditional forecast variable aligned to event timestamps and trading horizons -- not a binary positive/negative label.
  • Open-weight models (Qwen3, Llama 4) make local, Turkish-language financial NLP stacks increasingly practical without external API dependencies.
  • Label design is the central problem: forward return targets must be time-aligned to avoid leakage contamination in backtests.
  • Off-the-shelf English sentiment models misread Turkish financial context, especially KAP-style disclosures -- Turkish-native pipelines are essential.
  • The alpha comes from labeling discipline, entity resolution, and proper out-of-sample testing, not merely from using an LLM.

Sentiment as a Forecast Variable

Sentiment analysis in equities is easy to oversell and hard to do well. The useful version treats sentiment as a conditional forecast variable: a noisy signal that may explain cross-sectional returns, volatility, or volume once aligned to the correct event timestamp and trading horizon.

In the Turkish market, open-weight LLM ecosystems have matured significantly. Qwen has publicly released Qwen3-family weights, and Meta promotes Llama 4-family models and Llama Stack distributions for self-hosted workflows. That makes a local, Turkish-language financial NLP stack increasingly practical.

The central problem is label design. For a news item arriving at time \(t\), a common target is the forward return over horizon \(h\):

Forward Return Target
$$r_{t,h} = \ln\left(\frac{P_{t+h}}{P_t}\right)$$

A practical sentiment score can be defined as:

Sentiment Score
$$s_t = p(\text{bullish} \mid x_t) - p(\text{bearish} \mid x_t)$$

Local Inference Prototype

bist_sentiment.py Python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Qwen/Qwen3-8B"  # replace with local checkpoint
tokenizer  = AutoTokenizer.from_pretrained(model_name)
model      = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

texts = [
    "Şirket, beklentilerin üzerinde net kar açıkladı ve yeni yatırım planı duyurdu.",
    "Faiz kararı sonrası banka hisselerinde satış baskısı artıyor."
]

inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    probs  = torch.softmax(logits, dim=-1)

print(probs)  # [bearish, neutral, bullish]

Critical Pitfalls

The edge comes not from "using an LLM," but from building a time-aware, Turkish-native, market-aligned inference pipeline. The alpha comes from labeling discipline, entity resolution, and proper out-of-sample testing.

⚡ Daily Stock Signals Dashboard
500+ US stocks scanned daily after market close. Free BUY signals with backtest context.
View Signals →
Get Research Updates
Daily pre-open briefing with market signals, research highlights, and quantitative analysis. Free, no spam.
No spam. Unsubscribe anytime. Privacy Policy