diff options
| author | TheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com> | 2026-04-02 13:31:40 +0900 |
|---|---|---|
| committer | TheSiahxyz <164138827+TheSiahxyz@users.noreply.github.com> | 2026-04-02 13:31:40 +0900 |
| commit | fea55160794bcd048496b2ee3e282180d8869e2a (patch) | |
| tree | cef1625b30a894a9323782c63bbea4ffc2b757f7 /docs | |
| parent | e2344bc310817db763f697df3fb2a9be987ce6b0 (diff) | |
docs: add news-driven stock selector design spec
Design for upgrading MOC strategy from fixed symbols to dynamic,
news-driven stock selection with sentiment analysis pipeline.
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/superpowers/specs/2026-04-02-news-driven-stock-selector-design.md | 418 |
1 files changed, 418 insertions, 0 deletions
diff --git a/docs/superpowers/specs/2026-04-02-news-driven-stock-selector-design.md b/docs/superpowers/specs/2026-04-02-news-driven-stock-selector-design.md new file mode 100644 index 0000000..d439154 --- /dev/null +++ b/docs/superpowers/specs/2026-04-02-news-driven-stock-selector-design.md @@ -0,0 +1,418 @@ +# News-Driven Stock Selector Design + +**Date:** 2026-04-02 +**Goal:** Upgrade the MOC (Market on Close) strategy from fixed symbol lists to dynamic, news-driven stock selection. The system collects news/sentiment data continuously, then selects 2-3 optimal stocks daily before market close. + +--- + +## Architecture Overview + +``` +[Continuous Collection] [Pre-Close Decision] +Finnhub News ─┐ +RSS Feeds ─┤ +SEC EDGAR ─┤ +Truth Social ─┼→ DB (news_items) → Sentiment Aggregator → symbol_scores +Reddit ─┤ + Redis "news" (every 15 min) market_sentiment +Fear & Greed ─┤ +FOMC/Fed ─┘ + + 15:00 ET ─→ Candidate Pool (sentiment top + LLM picks) + 15:15 ET ─→ Technical Filter (RSI, EMA, volume) + 15:30 ET ─→ LLM Final Selection (2-3 stocks) → Telegram + 15:50 ET ─→ MOC Buy Execution + 09:35 ET ─→ Next-day Sell (existing MOC logic) +``` + +## 1. News Collector Service + +New service: `services/news-collector/` + +### Structure + +``` +services/news-collector/ +├── Dockerfile +├── pyproject.toml +├── src/news_collector/ +│ ├── __init__.py +│ ├── main.py # Scheduler: runs each collector on its interval +│ ├── config.py +│ └── collectors/ +│ ├── __init__.py +│ ├── base.py # BaseCollector ABC +│ ├── finnhub.py # Finnhub market news (free, 60 req/min) +│ ├── rss.py # Yahoo Finance, Google News, MarketWatch RSS +│ ├── sec_edgar.py # SEC EDGAR 8-K/10-Q filings +│ ├── truth_social.py # Truth Social scraping (Trump posts) +│ ├── reddit.py # Reddit (r/wallstreetbets, r/stocks) +│ ├── fear_greed.py # CNN Fear & Greed Index scraping +│ └── fed.py # FOMC statements, Fed announcements +└── tests/ +``` + +### BaseCollector Interface + +```python +class BaseCollector(ABC): + name: str + poll_interval: int # seconds + + @abstractmethod + async def collect(self) -> list[NewsItem]: + """Collect and return list of NewsItem.""" + + @abstractmethod + async def is_available(self) -> bool: + """Check if this source is accessible (API key present, endpoint reachable).""" +``` + +### Poll Intervals + +| Collector | Interval | Notes | +|-----------|----------|-------| +| Finnhub | 5 min | Free tier: 60 calls/min | +| RSS (Yahoo/Google/MarketWatch) | 10 min | Headlines only | +| SEC EDGAR | 30 min | Focus on 8-K filings | +| Truth Social | 15 min | Scraping | +| Reddit | 15 min | Hot posts from relevant subs | +| Fear & Greed | 1 hour | Updates once daily but check periodically | +| FOMC/Fed | 1 hour | Infrequent events | + +### Provider Abstraction (for paid upgrade path) + +```python +# config.yaml +collectors: + news: + provider: "finnhub" # swap to "benzinga" for paid + api_key: ${FINNHUB_API_KEY} + social: + provider: "reddit" # swap to "stocktwits_pro" etc. + policy: + provider: "truth_social" # swap to "twitter_api" etc. + +# Factory +COLLECTOR_REGISTRY = { + "finnhub": FinnhubCollector, + "rss": RSSCollector, + "benzinga": BenzingaCollector, # added later +} +``` + +## 2. Shared Models (additions to shared/) + +### NewsItem (shared/models.py) + +```python +class NewsCategory(str, Enum): + POLICY = "policy" + EARNINGS = "earnings" + MACRO = "macro" + SOCIAL = "social" + FILING = "filing" + FED = "fed" + +class NewsItem(BaseModel): + id: str = Field(default_factory=lambda: str(uuid.uuid4())) + source: str # "finnhub", "rss", "sec_edgar", etc. + headline: str + summary: str | None = None + url: str | None = None + published_at: datetime + symbols: list[str] = [] # Related tickers (if identifiable) + sentiment: float # -1.0 to 1.0 (first-pass analysis at collection) + category: NewsCategory + raw_data: dict = {} + created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc)) +``` + +### SymbolScore (shared/sentiment_models.py — new file) + +```python +class SymbolScore(BaseModel): + symbol: str + news_score: float # -1.0 to 1.0, weighted avg of news sentiment + news_count: int # Number of news items in last 24h + social_score: float # Reddit/social sentiment + policy_score: float # Policy-related impact + filing_score: float # SEC filing impact + composite: float # Weighted final score + updated_at: datetime + +class MarketSentiment(BaseModel): + fear_greed: int # 0-100 + fear_greed_label: str # "Extreme Fear", "Fear", "Neutral", "Greed", "Extreme Greed" + vix: float | None = None + fed_stance: str # "hawkish", "neutral", "dovish" + market_regime: str # "risk_on", "neutral", "risk_off" + updated_at: datetime + +class SelectedStock(BaseModel): + symbol: str + side: OrderSide # BUY or SELL + conviction: float # 0.0 to 1.0 + reason: str # Selection rationale + key_news: list[str] # Key news headlines + +class Candidate(BaseModel): + symbol: str + source: str # "sentiment" or "llm" + direction: OrderSide | None = None # Suggested direction (if known) + score: float # Relevance/priority score + reason: str # Why this candidate was selected +``` + +## 3. Sentiment Analysis Pipeline + +### Location + +Refactor existing `shared/src/shared/sentiment.py`. + +### Two-Stage Analysis + +**Stage 1: Per-news sentiment (at collection time)** +- VADER (nltk.sentiment, free) for English headlines +- Keyword rule engine for domain-specific terms (e.g., "tariff" → negative for importers, positive for domestic producers) +- Score stored in `NewsItem.sentiment` + +**Stage 2: Per-symbol aggregation (every 15 minutes)** + +``` +composite = ( + news_score * 0.3 + + social_score * 0.2 + + policy_score * 0.3 + + filing_score * 0.2 +) * freshness_decay +``` + +Freshness decay: +- < 1 hour: 1.0 +- 1-6 hours: 0.7 +- 6-24 hours: 0.3 +- > 24 hours: excluded + +Policy score weighted high because US stock market is heavily influenced by policy events (tariffs, regulation, subsidies). + +### Market-Level Gating + +`MarketSentiment.market_regime` determination: +- `risk_off`: Fear & Greed < 20 OR VIX > 30 → **block all trades** +- `risk_on`: Fear & Greed > 60 AND VIX < 20 +- `neutral`: everything else + +This extends the existing `sentiment.py` `should_block()` logic. + +## 4. Stock Selector Engine + +### Location + +`services/strategy-engine/src/strategy_engine/stock_selector.py` + +### Three-Stage Selection Process + +**Stage 1: Candidate Pool (15:00 ET)** + +Two candidate sources, results merged (deduplicated): + +```python +class CandidateSource(ABC): + @abstractmethod + async def get_candidates(self) -> list[Candidate] + +class SentimentCandidateSource(CandidateSource): + """Top N symbols by composite SymbolScore from DB.""" + +class LLMCandidateSource(CandidateSource): + """Send today's top news summary to Claude, get related symbols + direction.""" +``` + +- SentimentCandidateSource: top 20 by composite score +- LLMCandidateSource: Claude analyzes today's major news and recommends affected symbols +- Merged pool: typically 20-30 candidates + +**Stage 2: Technical Filter (15:15 ET)** + +Apply existing MOC screening criteria to candidates: +- Fetch recent price data from Alpaca for all candidates +- RSI 30-60 +- Price > 20-period EMA +- Volume > average +- Bullish candle pattern +- Result: typically 5-10 survivors + +**Stage 3: LLM Final Selection (15:30 ET)** + +Send to Claude: +- Filtered candidate list with technical indicators +- Per-symbol sentiment scores and top news headlines +- Market sentiment (Fear & Greed, VIX, Fed stance) +- Prompt: "Select 2-3 stocks for MOC trading with rationale" + +Response parsed into `list[SelectedStock]`. + +### Integration with MOC Strategy + +Current: MOC strategy receives candles for fixed symbols and decides internally. + +New flow: +1. `StockSelector` publishes `SelectedStock` list to Redis stream `selected_stocks` at 15:30 ET +2. MOC strategy reads `selected_stocks` to get today's targets +3. MOC still applies its own technical checks at 15:50-16:00 as a safety net +4. If a selected stock fails the final technical check, it's skipped (no forced trades) + +## 5. Database Schema + +Four new tables via Alembic migration: + +```sql +CREATE TABLE news_items ( + id UUID PRIMARY KEY, + source VARCHAR(50) NOT NULL, + headline TEXT NOT NULL, + summary TEXT, + url TEXT, + published_at TIMESTAMPTZ NOT NULL, + symbols TEXT[], + sentiment FLOAT NOT NULL, + category VARCHAR(50) NOT NULL, + raw_data JSONB DEFAULT '{}', + created_at TIMESTAMPTZ DEFAULT NOW() +); +CREATE INDEX idx_news_items_published ON news_items(published_at); +CREATE INDEX idx_news_items_symbols ON news_items USING GIN(symbols); + +CREATE TABLE symbol_scores ( + id UUID PRIMARY KEY, + symbol VARCHAR(10) NOT NULL, + news_score FLOAT NOT NULL DEFAULT 0, + news_count INT NOT NULL DEFAULT 0, + social_score FLOAT NOT NULL DEFAULT 0, + policy_score FLOAT NOT NULL DEFAULT 0, + filing_score FLOAT NOT NULL DEFAULT 0, + composite FLOAT NOT NULL DEFAULT 0, + updated_at TIMESTAMPTZ NOT NULL +); +CREATE UNIQUE INDEX idx_symbol_scores_symbol ON symbol_scores(symbol); + +CREATE TABLE market_sentiment ( + id UUID PRIMARY KEY, + fear_greed INT NOT NULL, + fear_greed_label VARCHAR(30) NOT NULL, + vix FLOAT, + fed_stance VARCHAR(20) NOT NULL DEFAULT 'neutral', + market_regime VARCHAR(20) NOT NULL DEFAULT 'neutral', + updated_at TIMESTAMPTZ NOT NULL +); + +CREATE TABLE stock_selections ( + id UUID PRIMARY KEY, + trade_date DATE NOT NULL, + symbol VARCHAR(10) NOT NULL, + side VARCHAR(4) NOT NULL, + conviction FLOAT NOT NULL, + reason TEXT NOT NULL, + key_news JSONB DEFAULT '[]', + sentiment_snapshot JSONB DEFAULT '{}', + created_at TIMESTAMPTZ DEFAULT NOW() +); +CREATE INDEX idx_stock_selections_date ON stock_selections(trade_date); +``` + +`stock_selections` stores an audit trail: why each stock was selected, enabling post-hoc analysis of selection quality. + +## 6. Redis Streams + +| Stream | Producer | Consumer | Payload | +|--------|----------|----------|---------| +| `news` | news-collector | strategy-engine (sentiment aggregator) | NewsItem | +| `selected_stocks` | stock-selector | MOC strategy | SelectedStock | + +Existing streams (`candles`, `signals`, `orders`) unchanged. + +## 7. Docker Compose Addition + +```yaml +news-collector: + build: + context: . + dockerfile: services/news-collector/Dockerfile + env_file: .env + ports: + - "8084:8084" + depends_on: + redis: { condition: service_healthy } + postgres: { condition: service_healthy } + healthcheck: + test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8084/health')"] + interval: 10s + timeout: 5s + retries: 3 + restart: unless-stopped +``` + +## 8. Environment Variables + +```bash +# News Collector +FINNHUB_API_KEY= # Free key from finnhub.io +NEWS_POLL_INTERVAL=300 # Default 5 min (overrides per-collector defaults) +SENTIMENT_AGGREGATE_INTERVAL=900 # 15 min + +# Stock Selector +SELECTOR_CANDIDATES_TIME=15:00 # ET, candidate pool generation +SELECTOR_FILTER_TIME=15:15 # ET, technical filter +SELECTOR_FINAL_TIME=15:30 # ET, LLM final pick +SELECTOR_MAX_PICKS=3 + +# LLM (for stock selector + screener) +ANTHROPIC_API_KEY= +ANTHROPIC_MODEL=claude-sonnet-4-20250514 +``` + +## 9. Telegram Notifications + +Extend existing `shared/notifier.py` with: + +```python +async def send_stock_selection(self, selections: list[SelectedStock], market: MarketSentiment): + """ + 📊 오늘의 종목 선정 (2/3) + + 1. NVDA 🟢 BUY (확신도: 0.85) + 근거: 트럼프 반도체 보조금 확대 발표, RSI 42 + 핵심뉴스: "Trump signs CHIPS Act expansion..." + + 2. XOM 🟢 BUY (확신도: 0.72) + 근거: 유가 상승 + 실적 서프라이즈, 볼륨 급증 + + 시장심리: Fear & Greed 55 (Neutral) | VIX 18.2 + """ +``` + +## 10. Testing Strategy + +**Unit tests:** +- Each collector: mock HTTP responses → verify NewsItem parsing +- Sentiment analysis: verify VADER + keyword scoring +- Aggregator: mock news data → verify SymbolScore calculation and freshness decay +- Stock selector: mock scores → verify candidate/filter/selection pipeline +- LLM calls: mock Claude response → verify SelectedStock parsing + +**Integration tests:** +- Full pipeline: news collection → DB → aggregation → selection +- Market gating: verify `risk_off` blocks all trades +- MOC integration: verify selected stocks flow to MOC strategy + +**Post-hoc analysis (future):** +- Use `stock_selections` audit trail to measure selection accuracy +- Historical news data replay for backtesting requires paid data (deferred) + +## 11. Out of Scope (Future) + +- Paid API integration (designed for, not implemented) +- Historical news backtesting +- WebSocket real-time news streaming +- Multi-language sentiment analysis +- Options/derivatives signals |
