Cost control

This commit is contained in:
2026-01-27 09:25:11 +01:00
parent 37eb03583c
commit 0ed89a7045
9 changed files with 733 additions and 6 deletions

258
COST_TRACKING.md Normal file
View File

@@ -0,0 +1,258 @@
# Cost Tracking
News Agent automatically tracks and displays the cost of each run in your daily email digest.
## Features
- **Per-Run Cost**: Shows exactly how much each digest cost to generate
- **Cumulative Total**: Tracks total spending across all runs
- **Database Storage**: All cost data saved in SQLite database
- **Email Display**: Costs shown at the bottom of every digest email
## How It Works
1. **OpenRouter API Returns Costs**: Each API call includes cost information
2. **Client Tracks Session Cost**: Accumulates cost during the run
3. **Database Stores Run Data**: Saves cost along with article counts
4. **Email Shows Costs**: Displays both session and cumulative costs
## Email Display
At the bottom of each digest email, you'll see:
```
💰 Cost Information
This digest: $0.0234 | Total spent: $1.2456
```
- **This digest**: Cost for generating this specific email (filtering + summarization)
- **Total spent**: Cumulative cost across all runs since you started using the system
## View Cost Statistics
Run the cost viewer script:
```bash
cd ~/news-agent
source .venv/bin/activate
python -m src.view_costs
```
**Example output:**
```
============================================================
News Agent Cost Statistics
============================================================
💰 Total Cumulative Cost: $1.2456
Recent Runs (last 20):
------------------------------------------------------------
Date Articles Included Cost
------------------------------------------------------------
2026-01-26 152 15 $0.0234
2026-01-25 143 12 $0.0198
2026-01-24 167 15 $0.0245
...
------------------------------------------------------------
Averages (last 20 runs):
Cost per run: $0.0226
Articles per digest: 14.2
Cost per article: $0.0016
============================================================
```
## Cost Breakdown
### Typical Costs (with `openai/gpt-4o-mini`)
| Operation | Articles | Cost |
|-----------|----------|------|
| Filtering | 150 articles | ~$0.015-0.020 |
| Summarization | 15 articles | ~$0.005-0.008 |
| **Total per run** | - | **~$0.020-0.028** |
### Monthly Estimates
| Frequency | Cost/Run | Monthly Cost |
|-----------|----------|--------------|
| Daily | $0.025 | ~$0.75/month |
| Daily | $0.030 | ~$0.90/month |
**Note:** Actual costs vary based on:
- Number of articles fetched
- Content length
- Model used
- Number of articles passing filter
## Cost Optimization Tips
### 1. Adjust Filtering Threshold
Higher threshold = fewer articles = lower cost:
```yaml
ai:
filtering:
min_score: 7.0 # Stricter (was 5.5)
max_articles: 10 # Fewer articles (was 15)
```
### 2. Use Cheaper Models
```yaml
ai:
model: "google/gemini-2.0-flash-exp:free" # FREE (has rate limits)
# OR
model: "openai/gpt-4o-mini" # Cheap and reliable
```
### 3. Reduce RSS Sources
Fewer sources = fewer articles = lower cost:
```yaml
sources:
rss:
# Comment out sources you don't need
```
### 4. Track and Set Budget Alerts
Monitor costs and adjust:
```bash
# View costs regularly
python -m src.view_costs
# If costs are too high, adjust config.yaml settings
```
## Database Schema
The `runs` table stores:
```sql
CREATE TABLE runs (
id INTEGER PRIMARY KEY,
run_date TEXT NOT NULL,
articles_fetched INTEGER NOT NULL,
articles_processed INTEGER NOT NULL,
articles_included INTEGER NOT NULL,
total_cost REAL NOT NULL,
filtering_cost REAL DEFAULT 0,
summarization_cost REAL DEFAULT 0,
created_at TEXT NOT NULL
);
```
## Query Costs Manually
Using SQLite:
```bash
sqlite3 data/articles.db
-- Total cost
SELECT SUM(total_cost) FROM runs;
-- Cost by date
SELECT run_date, total_cost FROM runs ORDER BY run_date DESC;
-- Average cost
SELECT AVG(total_cost) FROM runs;
-- Cost this month
SELECT SUM(total_cost) FROM runs
WHERE run_date >= date('now', 'start of month');
```
## Troubleshooting
### Cost Shows $0.0000
**Possible causes:**
1. **Using free model** - Free models may not report costs
2. **API response format** - OpenRouter might not include cost field
3. **First run** - Database just initialized
**Check:**
```bash
# View logs for cost tracking
grep -i "cost" data/logs/news-agent.log
# Check if runs are being saved
sqlite3 data/articles.db "SELECT * FROM runs ORDER BY created_at DESC LIMIT 5;"
```
### Cost Seems Wrong
**Verify model pricing:**
- Check https://openrouter.ai/models for current pricing
- Different models have different costs
- Costs change over time
**Check API responses:**
```bash
# Enable debug logging
# Edit config.yaml:
logging:
level: "DEBUG"
# Run and check logs
python -m src.main
grep "cost" data/logs/news-agent.log
```
## Export Cost Data
Export to CSV:
```bash
sqlite3 -header -csv data/articles.db \
"SELECT run_date, articles_processed, articles_included, total_cost
FROM runs ORDER BY run_date" > costs.csv
```
Import into Excel/Sheets for analysis.
## Cost Projections
Based on current usage, project future costs:
```python
# Get average cost
python -m src.view_costs
# Multiply by days
# If avg = $0.025/day:
# - Weekly: $0.025 × 7 = $0.175
# - Monthly: $0.025 × 30 = $0.75
# - Yearly: $0.025 × 365 = $9.13
```
## Privacy Note
All cost data is stored **locally** in your SQLite database (`data/articles.db`). No cost information is sent anywhere except:
- Displayed in your email (which you control)
- Stored in your local database
## Future Enhancements
Potential improvements:
1. **Split costs** - Separate filtering vs summarization costs
2. **Budget alerts** - Email warning if cost exceeds threshold
3. **Cost predictions** - Estimate next run cost
4. **Web dashboard** - Visualize cost trends over time
5. **Cost per category** - Track which categories cost most
## Need Help?
- View current costs: `python -m src.view_costs`
- Check database: `sqlite3 data/articles.db "SELECT * FROM runs;"`
- Review logs: `tail -f data/logs/news-agent.log`
- OpenRouter pricing: https://openrouter.ai/models

247
COST_TRACKING_SUMMARY.md Normal file
View File

@@ -0,0 +1,247 @@
# Cost Tracking Implementation Summary
## ✅ What Was Added
### 1. Database Schema
- New `runs` table tracks each execution
- Fields: articles_fetched, articles_processed, articles_included, costs, timestamps
- Indexes for efficient queries
### 2. Cost Extraction
- OpenRouter API responses include cost in `usage.cost` field
- AI client accumulates costs during session
- Automatic tracking without manual intervention
### 3. Database Methods
- `save_run()` - Store run statistics and costs
- `get_total_cost()` - Calculate cumulative spending
- `get_run_stats()` - Retrieve recent run history
### 4. Email Display
- **HTML Email**: Nice formatted cost box at bottom
- **Plain Text**: Cost information in footer
- Shows both session and cumulative costs
### 5. Cost Viewer Script
- `python -m src.view_costs` - View statistics
- Shows recent runs, averages, totals
- Export capability for analysis
## 📊 What You'll See
### In Your Email
Bottom of every digest:
```
💰 Cost Information
This digest: $0.0234 | Total spent: $1.2456
```
### In Cost Viewer
```bash
python -m src.view_costs
```
Output:
```
============================================================
News Agent Cost Statistics
============================================================
💰 Total Cumulative Cost: $1.2456
Recent Runs (last 20):
------------------------------------------------------------
Date Articles Included Cost
------------------------------------------------------------
2026-01-26 152 15 $0.0234
2026-01-25 143 12 $0.0198
...
Averages (last 20 runs):
Cost per run: $0.0226
Articles per digest: 14.2
Cost per article: $0.0016
```
## 🔍 How It Works
1. **During Filtering**: AI scores each article
- OpenRouter returns cost per API call
- Client tracks: `self.total_cost += cost`
2. **During Summarization**: AI summarizes selected articles
- More API calls, more cost
- Accumulated in same session
3. **After Processing**: Save to database
```python
await db.save_run(
articles_fetched=152,
articles_processed=25,
articles_included=15,
total_cost=0.0234,
)
```
4. **Before Email**: Calculate totals
```python
session_cost = ai_client.get_session_cost() # This run
cumulative_cost = await db.get_total_cost() # All time
```
5. **In Email**: Display both values
- Session cost: Just this digest
- Cumulative cost: Total since start
## 💰 Expected Costs
### With `openai/gpt-4o-mini`
| Scenario | Cost/Run | Monthly | Yearly |
|----------|----------|---------|--------|
| 150 articles, 15 selected | $0.02-0.03 | $0.60-0.90 | $7-11 |
| 200 articles, 15 selected | $0.03-0.04 | $0.90-1.20 | $11-15 |
| 100 articles, 10 selected | $0.01-0.02 | $0.30-0.60 | $4-7 |
**Note:** Costs vary based on article length and model used.
## 📁 Files Modified
### Core Tracking
- `src/storage/database.py` - Added runs table and cost methods
- `src/ai/client.py` - Track costs from API responses
- `src/main.py` - Save costs and pass to email generator
### Email Display
- `src/email/generator.py` - Accept cost parameters
- `src/email/templates/daily_digest.html` - Display costs nicely
### Utilities
- `src/view_costs.py` - NEW: Cost statistics viewer
- `COST_TRACKING.md` - NEW: Complete documentation
## 🎯 Benefits
1. **Transparency**: Know exactly what you're spending
2. **Budgeting**: Track costs over time
3. **Optimization**: Identify expensive runs
4. **Accountability**: See if changes save/cost money
5. **Planning**: Estimate future costs accurately
## 🔧 Usage
### View Costs Anytime
```bash
python -m src.view_costs
```
### Query Database
```bash
sqlite3 data/articles.db
-- Total spending
SELECT SUM(total_cost) FROM runs;
-- Last 10 runs
SELECT run_date, articles_included, total_cost
FROM runs ORDER BY run_date DESC LIMIT 10;
-- This month
SELECT SUM(total_cost) FROM runs
WHERE run_date >= date('now', 'start of month');
```
### Export to CSV
```bash
sqlite3 -header -csv data/articles.db \
"SELECT * FROM runs ORDER BY run_date" > costs.csv
```
## ⚙️ Cost Optimization
### Reduce Costs by:
1. **Higher filter threshold** (fewer articles):
```yaml
ai:
filtering:
min_score: 7.0 # Was 5.5
```
2. **Fewer max articles**:
```yaml
ai:
filtering:
max_articles: 10 # Was 15
```
3. **Cheaper model**:
```yaml
ai:
model: "google/gemini-2.0-flash-exp:free" # FREE
```
4. **Fewer RSS sources** (less to process)
## 🐛 Troubleshooting
### Cost Shows $0.0000
**Likely causes:**
- Using free model (doesn't report costs)
- OpenRouter API response doesn't include cost field
- First run (no history yet)
**Check logs:**
```bash
grep -i "cost" data/logs/news-agent.log
```
### Cost Seems High
**Review:**
1. Number of articles being processed
2. Model being used (check pricing at openrouter.ai)
3. Article content length
4. Recent runs: `python -m src.view_costs`
## 📝 Notes
- Costs stored in cents/dollars (OpenRouter credits = USD)
- Database never deleted (unless you manually delete it)
- Cumulative cost includes ALL runs since database creation
- Costs shown with 4 decimal places ($0.0234)
- Free models may show $0.00 (rate limited instead)
## 🚀 Future Enhancements
Possible additions:
1. **Budget alerts** - Email if cost exceeds threshold
2. **Cost breakdown** - Separate filtering vs summarization
3. **Cost predictions** - Estimate before running
4. **Monthly reports** - Summary email at month end
5. **Cost per category** - Track which topics cost most
6. **Web dashboard** - Visualize trends
## 📚 Documentation
See full details in:
- **COST_TRACKING.md** - Complete documentation
- **README.md** - Updated with cost tracking feature
- **src/view_costs.py** - Source code for viewer
## ✨ Example Output
The cost information appears at the bottom of every email:
```
┌─────────────────────────────────────┐
│ 💰 Cost Information │
│ This digest: $0.0234 │
│ Total spent: $1.2456 │
└─────────────────────────────────────┘
```
Clean, simple, and informative!

View File

@@ -8,6 +8,7 @@ An AI-powered daily tech news aggregator that fetches articles from RSS feeds, f
- **AI Filtering**: Uses OpenRouter AI to score articles based on your interests (0-10 scale)
- **Smart Summarization**: Generates concise 2-3 sentence summaries of each relevant article
- **Beautiful Emails**: HTML email with responsive design, categorized sections, and relevance scores
- **Cost Tracking**: Automatic tracking and display of per-run and cumulative costs in emails
- **Deduplication**: SQLite database prevents duplicate articles
- **Automated Scheduling**: Runs daily at 07:00 Europe/Oslo time via systemd timer
- **Production Ready**: Error handling, logging, resource limits, and monitoring

View File

@@ -28,6 +28,7 @@ class OpenRouterClient:
)
self.model = config.ai.model
self.total_cost = 0.0 # Track cumulative cost for this session
logger.debug(f"Initialized OpenRouter client with model: {self.model}")
async def chat_completion(
@@ -73,7 +74,7 @@ class OpenRouterClient:
if not content:
raise ValueError("Empty response from API")
# Log token usage
# Track cost from OpenRouter response
if response.usage:
logger.debug(
f"Tokens used - Prompt: {response.usage.prompt_tokens}, "
@@ -81,6 +82,15 @@ class OpenRouterClient:
f"Total: {response.usage.total_tokens}"
)
# OpenRouter returns cost in credits (1 credit = $1)
# The usage object has a 'cost' field in newer responses
if hasattr(response.usage, "cost") and response.usage.cost is not None:
cost = float(response.usage.cost)
self.total_cost += cost
logger.debug(
f"Request cost: ${cost:.6f}, Session total: ${self.total_cost:.6f}"
)
return content
except Exception as e:
@@ -115,3 +125,11 @@ class OpenRouterClient:
except json.JSONDecodeError as e:
logger.error(f"Failed to parse JSON response: {content}")
raise ValueError(f"Invalid JSON response: {e}")
def get_session_cost(self) -> float:
"""Get total cost accumulated during this session"""
return self.total_cost
def reset_session_cost(self):
"""Reset session cost counter"""
self.total_cost = 0.0

View File

@@ -22,7 +22,12 @@ class EmailGenerator:
self.env = Environment(loader=FileSystemLoader(template_dir))
def generate_digest_email(
self, entries: list[DigestEntry], date_str: str, subject: str
self,
entries: list[DigestEntry],
date_str: str,
subject: str,
session_cost: float = 0.0,
cumulative_cost: float = 0.0,
) -> tuple[str, str]:
"""
Generate HTML email for daily digest
@@ -31,6 +36,8 @@ class EmailGenerator:
entries: List of digest entries (articles with summaries)
date_str: Date string for the digest
subject: Email subject line
session_cost: Cost for this digest generation
cumulative_cost: Total cost across all runs
Returns:
Tuple of (html_content, text_content)
@@ -54,6 +61,8 @@ class EmailGenerator:
"total_sources": unique_sources,
"total_categories": len(sorted_categories),
"articles_by_category": {cat: articles_by_category[cat] for cat in sorted_categories},
"session_cost": session_cost,
"cumulative_cost": cumulative_cost,
}
# Render HTML template
@@ -64,14 +73,21 @@ class EmailGenerator:
html_inlined = transform(html)
# Generate plain text version
text = self._generate_text_version(entries, date_str, subject)
text = self._generate_text_version(
entries, date_str, subject, session_cost, cumulative_cost
)
logger.debug(f"Generated email with {len(entries)} articles")
return html_inlined, text
def _generate_text_version(
self, entries: list[DigestEntry], date_str: str, subject: str
self,
entries: list[DigestEntry],
date_str: str,
subject: str,
session_cost: float = 0.0,
cumulative_cost: float = 0.0,
) -> str:
"""Generate plain text version of email"""
lines = [
@@ -110,5 +126,9 @@ class EmailGenerator:
lines.append("")
lines.append("---")
lines.append("Generated by News Agent | Powered by OpenRouter AI")
lines.append("")
lines.append("COST INFORMATION")
lines.append(f"This digest: ${session_cost:.4f}")
lines.append(f"Total spent: ${cumulative_cost:.4f}")
return "\n".join(lines)

View File

@@ -188,6 +188,15 @@
<div class="footer">
<p>Generated by News Agent | Powered by OpenRouter AI</p>
<p>You received this because you subscribed to daily tech news digests</p>
<hr style="margin: 20px 0; border: none; border-top: 1px solid #e5e7eb;">
<div style="background-color: #f9fafb; padding: 15px; border-radius: 6px; margin-top: 20px;">
<p style="margin: 0; font-weight: 600; color: #374151;">💰 Cost Information</p>
<p style="margin: 5px 0 0 0; font-size: 14px;">
<span style="color: #059669;">This digest: ${{ "%.4f"|format(session_cost) }}</span>
&nbsp;|&nbsp;
<span style="color: #2563eb;">Total spent: ${{ "%.4f"|format(cumulative_cost) }}</span>
</p>
</div>
</div>
</div>
</body>

View File

@@ -75,6 +75,15 @@ async def main():
await db.update_article_processing(
article.id, relevance_score=0.0, ai_summary="", included=False
)
# Still save the run with costs (for filtering only)
session_cost = ai_client.get_session_cost()
await db.save_run(
articles_fetched=len(articles),
articles_processed=len(unprocessed),
articles_included=0,
total_cost=session_cost,
)
return
# Summarize filtered articles (using batch processing for speed, silently)
@@ -109,14 +118,29 @@ async def main():
article.id, relevance_score=0.0, ai_summary="", included=False
)
# Generate email (silently)
# Get cost information
session_cost = ai_client.get_session_cost()
total_cost = await db.get_total_cost()
cumulative_cost = total_cost + session_cost
# Save run statistics with costs
await db.save_run(
articles_fetched=len(articles),
articles_processed=len(unprocessed),
articles_included=len(digest_entries),
total_cost=session_cost,
filtering_cost=0.0, # Could split this if tracking separately
summarization_cost=0.0,
)
# Generate email (silently) with cost info
generator = EmailGenerator()
date_str = datetime.now().strftime("%A, %B %d, %Y")
subject = config.email.subject_template.format(date=date_str)
html_content, text_content = generator.generate_digest_email(
digest_entries, date_str, subject
digest_entries, date_str, subject, session_cost, cumulative_cost
)
# Send email (silently)

View File

@@ -56,6 +56,30 @@ class Database:
"""
)
# Create runs table for tracking costs
await db.execute(
"""
CREATE TABLE IF NOT EXISTS runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_date TEXT NOT NULL,
articles_fetched INTEGER NOT NULL,
articles_processed INTEGER NOT NULL,
articles_included INTEGER NOT NULL,
total_cost REAL NOT NULL,
filtering_cost REAL DEFAULT 0,
summarization_cost REAL DEFAULT 0,
created_at TEXT NOT NULL
)
"""
)
await db.execute(
"""
CREATE INDEX IF NOT EXISTS idx_run_date
ON runs(run_date)
"""
)
await db.commit()
logger.debug(f"Database initialized at {self.db_path}")
@@ -175,6 +199,72 @@ class Database:
if deleted > 0:
logger.debug(f"Cleaned up {deleted} old articles")
async def save_run(
self,
articles_fetched: int,
articles_processed: int,
articles_included: int,
total_cost: float,
filtering_cost: float = 0.0,
summarization_cost: float = 0.0,
) -> int:
"""
Save run statistics including costs
Returns:
Run ID
"""
run_date = datetime.now().date().isoformat()
created_at = datetime.now().isoformat()
async with aiosqlite.connect(self.db_path) as db:
cursor = await db.execute(
"""
INSERT INTO runs (
run_date, articles_fetched, articles_processed,
articles_included, total_cost, filtering_cost,
summarization_cost, created_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
run_date,
articles_fetched,
articles_processed,
articles_included,
total_cost,
filtering_cost,
summarization_cost,
created_at,
),
)
run_id = cursor.lastrowid
await db.commit()
logger.debug(f"Saved run {run_id}: ${total_cost:.4f}")
return run_id
async def get_total_cost(self) -> float:
"""Get cumulative total cost across all runs"""
async with aiosqlite.connect(self.db_path) as db:
async with db.execute("SELECT SUM(total_cost) FROM runs") as cursor:
result = await cursor.fetchone()
return result[0] if result[0] is not None else 0.0
async def get_run_stats(self, limit: int = 10) -> list[dict]:
"""Get recent run statistics"""
async with aiosqlite.connect(self.db_path) as db:
db.row_factory = aiosqlite.Row
async with db.execute(
"""
SELECT * FROM runs
ORDER BY created_at DESC
LIMIT ?
""",
(limit,),
) as cursor:
rows = await cursor.fetchall()
return [dict(row) for row in rows]
def _row_to_article(self, row: aiosqlite.Row) -> Article:
"""Convert database row to Article model"""
return Article(

60
src/view_costs.py Normal file
View File

@@ -0,0 +1,60 @@
#!/usr/bin/env python3
"""View cost statistics from the database"""
import asyncio
from .config import get_config
from .storage.database import Database
async def main():
"""Display cost statistics"""
config = get_config()
db = Database(config.database.path)
print("\n" + "=" * 60)
print("News Agent Cost Statistics")
print("=" * 60 + "\n")
# Get total cost
total_cost = await db.get_total_cost()
print(f"💰 Total Cumulative Cost: ${total_cost:.4f}")
print()
# Get recent runs
runs = await db.get_run_stats(limit=20)
if not runs:
print("No runs recorded yet.")
return
print(f"Recent Runs (last {len(runs)}):")
print("-" * 60)
print(f"{'Date':<12} {'Articles':<10} {'Included':<10} {'Cost':<10}")
print("-" * 60)
for run in runs:
date = run["run_date"]
articles = run["articles_processed"]
included = run["articles_included"]
cost = run["total_cost"]
print(f"{date:<12} {articles:<10} {included:<10} ${cost:<9.4f}")
print("-" * 60)
# Calculate averages
if runs:
avg_cost = sum(r["total_cost"] for r in runs) / len(runs)
avg_articles = sum(r["articles_included"] for r in runs) / len(runs)
print(f"\nAverages (last {len(runs)} runs):")
print(f" Cost per run: ${avg_cost:.4f}")
print(f" Articles per digest: {avg_articles:.1f}")
if avg_articles > 0:
print(f" Cost per article: ${avg_cost / avg_articles:.4f}")
print("\n" + "=" * 60 + "\n")
if __name__ == "__main__":
asyncio.run(main())