1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
|
# News-Driven Stock Selector Design
**Date:** 2026-04-02
**Goal:** Upgrade the MOC (Market on Close) strategy from fixed symbol lists to dynamic, news-driven stock selection. The system collects news/sentiment data continuously, then selects 2-3 optimal stocks daily before market close.
---
## Architecture Overview
```
[Continuous Collection] [Pre-Close Decision]
Finnhub News ─┐
RSS Feeds ─┤
SEC EDGAR ─┤
Truth Social ─┼→ DB (news_items) → Sentiment Aggregator → symbol_scores
Reddit ─┤ + Redis "news" (every 15 min) market_sentiment
Fear & Greed ─┤
FOMC/Fed ─┘
15:00 ET ─→ Candidate Pool (sentiment top + LLM picks)
15:15 ET ─→ Technical Filter (RSI, EMA, volume)
15:30 ET ─→ LLM Final Selection (2-3 stocks) → Telegram
15:50 ET ─→ MOC Buy Execution
09:35 ET ─→ Next-day Sell (existing MOC logic)
```
## 1. News Collector Service
New service: `services/news-collector/`
### Structure
```
services/news-collector/
├── Dockerfile
├── pyproject.toml
├── src/news_collector/
│ ├── __init__.py
│ ├── main.py # Scheduler: runs each collector on its interval
│ ├── config.py
│ └── collectors/
│ ├── __init__.py
│ ├── base.py # BaseCollector ABC
│ ├── finnhub.py # Finnhub market news (free, 60 req/min)
│ ├── rss.py # Yahoo Finance, Google News, MarketWatch RSS
│ ├── sec_edgar.py # SEC EDGAR 8-K/10-Q filings
│ ├── truth_social.py # Truth Social scraping (Trump posts)
│ ├── reddit.py # Reddit (r/wallstreetbets, r/stocks)
│ ├── fear_greed.py # CNN Fear & Greed Index scraping
│ └── fed.py # FOMC statements, Fed announcements
└── tests/
```
### BaseCollector Interface
```python
class BaseCollector(ABC):
name: str
poll_interval: int # seconds
@abstractmethod
async def collect(self) -> list[NewsItem]:
"""Collect and return list of NewsItem."""
@abstractmethod
async def is_available(self) -> bool:
"""Check if this source is accessible (API key present, endpoint reachable)."""
```
### Poll Intervals
| Collector | Interval | Notes |
|-----------|----------|-------|
| Finnhub | 5 min | Free tier: 60 calls/min |
| RSS (Yahoo/Google/MarketWatch) | 10 min | Headlines only |
| SEC EDGAR | 30 min | Focus on 8-K filings |
| Truth Social | 15 min | Scraping |
| Reddit | 15 min | Hot posts from relevant subs |
| Fear & Greed | 1 hour | Updates once daily but check periodically |
| FOMC/Fed | 1 hour | Infrequent events |
### Provider Abstraction (for paid upgrade path)
```python
# config.yaml
collectors:
news:
provider: "finnhub" # swap to "benzinga" for paid
api_key: ${FINNHUB_API_KEY}
social:
provider: "reddit" # swap to "stocktwits_pro" etc.
policy:
provider: "truth_social" # swap to "twitter_api" etc.
# Factory
COLLECTOR_REGISTRY = {
"finnhub": FinnhubCollector,
"rss": RSSCollector,
"benzinga": BenzingaCollector, # added later
}
```
## 2. Shared Models (additions to shared/)
### NewsItem (shared/models.py)
```python
class NewsCategory(str, Enum):
POLICY = "policy"
EARNINGS = "earnings"
MACRO = "macro"
SOCIAL = "social"
FILING = "filing"
FED = "fed"
class NewsItem(BaseModel):
id: str = Field(default_factory=lambda: str(uuid.uuid4()))
source: str # "finnhub", "rss", "sec_edgar", etc.
headline: str
summary: str | None = None
url: str | None = None
published_at: datetime
symbols: list[str] = [] # Related tickers (if identifiable)
sentiment: float # -1.0 to 1.0 (first-pass analysis at collection)
category: NewsCategory
raw_data: dict = {}
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
```
### SymbolScore (shared/sentiment_models.py — new file)
```python
class SymbolScore(BaseModel):
symbol: str
news_score: float # -1.0 to 1.0, weighted avg of news sentiment
news_count: int # Number of news items in last 24h
social_score: float # Reddit/social sentiment
policy_score: float # Policy-related impact
filing_score: float # SEC filing impact
composite: float # Weighted final score
updated_at: datetime
class MarketSentiment(BaseModel):
fear_greed: int # 0-100
fear_greed_label: str # "Extreme Fear", "Fear", "Neutral", "Greed", "Extreme Greed"
vix: float | None = None
fed_stance: str # "hawkish", "neutral", "dovish"
market_regime: str # "risk_on", "neutral", "risk_off"
updated_at: datetime
class SelectedStock(BaseModel):
symbol: str
side: OrderSide # BUY or SELL
conviction: float # 0.0 to 1.0
reason: str # Selection rationale
key_news: list[str] # Key news headlines
class Candidate(BaseModel):
symbol: str
source: str # "sentiment" or "llm"
direction: OrderSide | None = None # Suggested direction (if known)
score: float # Relevance/priority score
reason: str # Why this candidate was selected
```
## 3. Sentiment Analysis Pipeline
### Location
Refactor existing `shared/src/shared/sentiment.py`.
### Two-Stage Analysis
**Stage 1: Per-news sentiment (at collection time)**
- VADER (nltk.sentiment, free) for English headlines
- Keyword rule engine for domain-specific terms (e.g., "tariff" → negative for importers, positive for domestic producers)
- Score stored in `NewsItem.sentiment`
**Stage 2: Per-symbol aggregation (every 15 minutes)**
```
composite = (
news_score * 0.3 +
social_score * 0.2 +
policy_score * 0.3 +
filing_score * 0.2
) * freshness_decay
```
Freshness decay:
- < 1 hour: 1.0
- 1-6 hours: 0.7
- 6-24 hours: 0.3
- > 24 hours: excluded
Policy score weighted high because US stock market is heavily influenced by policy events (tariffs, regulation, subsidies).
### Market-Level Gating
`MarketSentiment.market_regime` determination:
- `risk_off`: Fear & Greed < 20 OR VIX > 30 → **block all trades**
- `risk_on`: Fear & Greed > 60 AND VIX < 20
- `neutral`: everything else
This extends the existing `sentiment.py` `should_block()` logic.
## 4. Stock Selector Engine
### Location
`services/strategy-engine/src/strategy_engine/stock_selector.py`
### Three-Stage Selection Process
**Stage 1: Candidate Pool (15:00 ET)**
Two candidate sources, results merged (deduplicated):
```python
class CandidateSource(ABC):
@abstractmethod
async def get_candidates(self) -> list[Candidate]
class SentimentCandidateSource(CandidateSource):
"""Top N symbols by composite SymbolScore from DB."""
class LLMCandidateSource(CandidateSource):
"""Send today's top news summary to Claude, get related symbols + direction."""
```
- SentimentCandidateSource: top 20 by composite score
- LLMCandidateSource: Claude analyzes today's major news and recommends affected symbols
- Merged pool: typically 20-30 candidates
**Stage 2: Technical Filter (15:15 ET)**
Apply existing MOC screening criteria to candidates:
- Fetch recent price data from Alpaca for all candidates
- RSI 30-60
- Price > 20-period EMA
- Volume > average
- Bullish candle pattern
- Result: typically 5-10 survivors
**Stage 3: LLM Final Selection (15:30 ET)**
Send to Claude:
- Filtered candidate list with technical indicators
- Per-symbol sentiment scores and top news headlines
- Market sentiment (Fear & Greed, VIX, Fed stance)
- Prompt: "Select 2-3 stocks for MOC trading with rationale"
Response parsed into `list[SelectedStock]`.
### Integration with MOC Strategy
Current: MOC strategy receives candles for fixed symbols and decides internally.
New flow:
1. `StockSelector` publishes `SelectedStock` list to Redis stream `selected_stocks` at 15:30 ET
2. MOC strategy reads `selected_stocks` to get today's targets
3. MOC still applies its own technical checks at 15:50-16:00 as a safety net
4. If a selected stock fails the final technical check, it's skipped (no forced trades)
## 5. Database Schema
Four new tables via Alembic migration:
```sql
CREATE TABLE news_items (
id UUID PRIMARY KEY,
source VARCHAR(50) NOT NULL,
headline TEXT NOT NULL,
summary TEXT,
url TEXT,
published_at TIMESTAMPTZ NOT NULL,
symbols TEXT[],
sentiment FLOAT NOT NULL,
category VARCHAR(50) NOT NULL,
raw_data JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_news_items_published ON news_items(published_at);
CREATE INDEX idx_news_items_symbols ON news_items USING GIN(symbols);
CREATE TABLE symbol_scores (
id UUID PRIMARY KEY,
symbol VARCHAR(10) NOT NULL,
news_score FLOAT NOT NULL DEFAULT 0,
news_count INT NOT NULL DEFAULT 0,
social_score FLOAT NOT NULL DEFAULT 0,
policy_score FLOAT NOT NULL DEFAULT 0,
filing_score FLOAT NOT NULL DEFAULT 0,
composite FLOAT NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL
);
CREATE UNIQUE INDEX idx_symbol_scores_symbol ON symbol_scores(symbol);
CREATE TABLE market_sentiment (
id UUID PRIMARY KEY,
fear_greed INT NOT NULL,
fear_greed_label VARCHAR(30) NOT NULL,
vix FLOAT,
fed_stance VARCHAR(20) NOT NULL DEFAULT 'neutral',
market_regime VARCHAR(20) NOT NULL DEFAULT 'neutral',
updated_at TIMESTAMPTZ NOT NULL
);
CREATE TABLE stock_selections (
id UUID PRIMARY KEY,
trade_date DATE NOT NULL,
symbol VARCHAR(10) NOT NULL,
side VARCHAR(4) NOT NULL,
conviction FLOAT NOT NULL,
reason TEXT NOT NULL,
key_news JSONB DEFAULT '[]',
sentiment_snapshot JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_stock_selections_date ON stock_selections(trade_date);
```
`stock_selections` stores an audit trail: why each stock was selected, enabling post-hoc analysis of selection quality.
## 6. Redis Streams
| Stream | Producer | Consumer | Payload |
|--------|----------|----------|---------|
| `news` | news-collector | strategy-engine (sentiment aggregator) | NewsItem |
| `selected_stocks` | stock-selector | MOC strategy | SelectedStock |
Existing streams (`candles`, `signals`, `orders`) unchanged.
## 7. Docker Compose Addition
```yaml
news-collector:
build:
context: .
dockerfile: services/news-collector/Dockerfile
env_file: .env
ports:
- "8084:8084"
depends_on:
redis: { condition: service_healthy }
postgres: { condition: service_healthy }
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8084/health')"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
```
## 8. Environment Variables
```bash
# News Collector
FINNHUB_API_KEY= # Free key from finnhub.io
NEWS_POLL_INTERVAL=300 # Default 5 min (overrides per-collector defaults)
SENTIMENT_AGGREGATE_INTERVAL=900 # 15 min
# Stock Selector
SELECTOR_CANDIDATES_TIME=15:00 # ET, candidate pool generation
SELECTOR_FILTER_TIME=15:15 # ET, technical filter
SELECTOR_FINAL_TIME=15:30 # ET, LLM final pick
SELECTOR_MAX_PICKS=3
# LLM (for stock selector + screener)
ANTHROPIC_API_KEY=
ANTHROPIC_MODEL=claude-sonnet-4-20250514
```
## 9. Telegram Notifications
Extend existing `shared/notifier.py` with:
```python
async def send_stock_selection(self, selections: list[SelectedStock], market: MarketSentiment):
"""
📊 오늘의 종목 선정 (2/3)
1. NVDA 🟢 BUY (확신도: 0.85)
근거: 트럼프 반도체 보조금 확대 발표, RSI 42
핵심뉴스: "Trump signs CHIPS Act expansion..."
2. XOM 🟢 BUY (확신도: 0.72)
근거: 유가 상승 + 실적 서프라이즈, 볼륨 급증
시장심리: Fear & Greed 55 (Neutral) | VIX 18.2
"""
```
## 10. Testing Strategy
**Unit tests:**
- Each collector: mock HTTP responses → verify NewsItem parsing
- Sentiment analysis: verify VADER + keyword scoring
- Aggregator: mock news data → verify SymbolScore calculation and freshness decay
- Stock selector: mock scores → verify candidate/filter/selection pipeline
- LLM calls: mock Claude response → verify SelectedStock parsing
**Integration tests:**
- Full pipeline: news collection → DB → aggregation → selection
- Market gating: verify `risk_off` blocks all trades
- MOC integration: verify selected stocks flow to MOC strategy
**Post-hoc analysis (future):**
- Use `stock_selections` audit trail to measure selection accuracy
- Historical news data replay for backtesting requires paid data (deferred)
## 11. Out of Scope (Future)
- Paid API integration (designed for, not implemented)
- Historical news backtesting
- WebSocket real-time news streaming
- Multi-language sentiment analysis
- Options/derivatives signals
|