Files
NewsAgent/COST_TRACKING_SUMMARY.md
2026-01-27 09:25:11 +01:00

248 lines
6.2 KiB
Markdown

# Cost Tracking Implementation Summary
## ✅ What Was Added
### 1. Database Schema
- New `runs` table tracks each execution
- Fields: articles_fetched, articles_processed, articles_included, costs, timestamps
- Indexes for efficient queries
### 2. Cost Extraction
- OpenRouter API responses include cost in `usage.cost` field
- AI client accumulates costs during session
- Automatic tracking without manual intervention
### 3. Database Methods
- `save_run()` - Store run statistics and costs
- `get_total_cost()` - Calculate cumulative spending
- `get_run_stats()` - Retrieve recent run history
### 4. Email Display
- **HTML Email**: Nice formatted cost box at bottom
- **Plain Text**: Cost information in footer
- Shows both session and cumulative costs
### 5. Cost Viewer Script
- `python -m src.view_costs` - View statistics
- Shows recent runs, averages, totals
- Export capability for analysis
## 📊 What You'll See
### In Your Email
Bottom of every digest:
```
💰 Cost Information
This digest: $0.0234 | Total spent: $1.2456
```
### In Cost Viewer
```bash
python -m src.view_costs
```
Output:
```
============================================================
News Agent Cost Statistics
============================================================
💰 Total Cumulative Cost: $1.2456
Recent Runs (last 20):
------------------------------------------------------------
Date Articles Included Cost
------------------------------------------------------------
2026-01-26 152 15 $0.0234
2026-01-25 143 12 $0.0198
...
Averages (last 20 runs):
Cost per run: $0.0226
Articles per digest: 14.2
Cost per article: $0.0016
```
## 🔍 How It Works
1. **During Filtering**: AI scores each article
- OpenRouter returns cost per API call
- Client tracks: `self.total_cost += cost`
2. **During Summarization**: AI summarizes selected articles
- More API calls, more cost
- Accumulated in same session
3. **After Processing**: Save to database
```python
await db.save_run(
articles_fetched=152,
articles_processed=25,
articles_included=15,
total_cost=0.0234,
)
```
4. **Before Email**: Calculate totals
```python
session_cost = ai_client.get_session_cost() # This run
cumulative_cost = await db.get_total_cost() # All time
```
5. **In Email**: Display both values
- Session cost: Just this digest
- Cumulative cost: Total since start
## 💰 Expected Costs
### With `openai/gpt-4o-mini`
| Scenario | Cost/Run | Monthly | Yearly |
|----------|----------|---------|--------|
| 150 articles, 15 selected | $0.02-0.03 | $0.60-0.90 | $7-11 |
| 200 articles, 15 selected | $0.03-0.04 | $0.90-1.20 | $11-15 |
| 100 articles, 10 selected | $0.01-0.02 | $0.30-0.60 | $4-7 |
**Note:** Costs vary based on article length and model used.
## 📁 Files Modified
### Core Tracking
- `src/storage/database.py` - Added runs table and cost methods
- `src/ai/client.py` - Track costs from API responses
- `src/main.py` - Save costs and pass to email generator
### Email Display
- `src/email/generator.py` - Accept cost parameters
- `src/email/templates/daily_digest.html` - Display costs nicely
### Utilities
- `src/view_costs.py` - NEW: Cost statistics viewer
- `COST_TRACKING.md` - NEW: Complete documentation
## 🎯 Benefits
1. **Transparency**: Know exactly what you're spending
2. **Budgeting**: Track costs over time
3. **Optimization**: Identify expensive runs
4. **Accountability**: See if changes save/cost money
5. **Planning**: Estimate future costs accurately
## 🔧 Usage
### View Costs Anytime
```bash
python -m src.view_costs
```
### Query Database
```bash
sqlite3 data/articles.db
-- Total spending
SELECT SUM(total_cost) FROM runs;
-- Last 10 runs
SELECT run_date, articles_included, total_cost
FROM runs ORDER BY run_date DESC LIMIT 10;
-- This month
SELECT SUM(total_cost) FROM runs
WHERE run_date >= date('now', 'start of month');
```
### Export to CSV
```bash
sqlite3 -header -csv data/articles.db \
"SELECT * FROM runs ORDER BY run_date" > costs.csv
```
## ⚙️ Cost Optimization
### Reduce Costs by:
1. **Higher filter threshold** (fewer articles):
```yaml
ai:
filtering:
min_score: 7.0 # Was 5.5
```
2. **Fewer max articles**:
```yaml
ai:
filtering:
max_articles: 10 # Was 15
```
3. **Cheaper model**:
```yaml
ai:
model: "google/gemini-2.0-flash-exp:free" # FREE
```
4. **Fewer RSS sources** (less to process)
## 🐛 Troubleshooting
### Cost Shows $0.0000
**Likely causes:**
- Using free model (doesn't report costs)
- OpenRouter API response doesn't include cost field
- First run (no history yet)
**Check logs:**
```bash
grep -i "cost" data/logs/news-agent.log
```
### Cost Seems High
**Review:**
1. Number of articles being processed
2. Model being used (check pricing at openrouter.ai)
3. Article content length
4. Recent runs: `python -m src.view_costs`
## 📝 Notes
- Costs stored in cents/dollars (OpenRouter credits = USD)
- Database never deleted (unless you manually delete it)
- Cumulative cost includes ALL runs since database creation
- Costs shown with 4 decimal places ($0.0234)
- Free models may show $0.00 (rate limited instead)
## 🚀 Future Enhancements
Possible additions:
1. **Budget alerts** - Email if cost exceeds threshold
2. **Cost breakdown** - Separate filtering vs summarization
3. **Cost predictions** - Estimate before running
4. **Monthly reports** - Summary email at month end
5. **Cost per category** - Track which topics cost most
6. **Web dashboard** - Visualize trends
## 📚 Documentation
See full details in:
- **COST_TRACKING.md** - Complete documentation
- **README.md** - Updated with cost tracking feature
- **src/view_costs.py** - Source code for viewer
## ✨ Example Output
The cost information appears at the bottom of every email:
```
┌─────────────────────────────────────┐
│ 💰 Cost Information │
│ This digest: $0.0234 │
│ Total spent: $1.2456 │
└─────────────────────────────────────┘
```
Clean, simple, and informative!