Methodology
How the labels are produced.
The student model
Labeling a streaming news feed with a frontier model is too slow and costly. Instead we distill: a frontier model labels a teaching set, and a smaller multi-task "student" model learns to reproduce those labels: relevance gate + domain + impact + region in a single pass. The result is frontier-quality labeling at API speed.
Sentiment
Sentiment is scored separately by FinBERT (ProsusAI/finbert), a finance-tuned language model,
on the article headline. We return both the continuous score (sentiment) and the class
(sentimentLabel).
Point-in-time
Each article carries its publish day (date). The feed is append-only and labels are fixed at
enrichment time, so a query for a past date returns what was known then; no look-ahead.
What you get vs. raw GDELT
Raw GDELT is an unlabeled firehose. We add the relevance gate (signal vs. noise), the faceting (so you can slice by what matters), and finance-tuned sentiment: the parts that are expensive to build and maintain yourself.