Files
NewsAgent/README.md
2026-01-26 12:34:00 +01:00

361 lines
7.9 KiB
Markdown

# News Agent
An AI-powered daily tech news aggregator that fetches articles from RSS feeds, filters them by relevance to your interests, generates AI summaries, and emails you a beautifully formatted digest every morning.
## Features
- **RSS Aggregation**: Fetches from 15+ tech news sources covering Development, Self-hosting, Enterprise Architecture, and Gadgets
- **AI Filtering**: Uses OpenRouter AI to score articles based on your interests (0-10 scale)
- **Smart Summarization**: Generates concise 2-3 sentence summaries of each relevant article
- **Beautiful Emails**: HTML email with responsive design, categorized sections, and relevance scores
- **Deduplication**: SQLite database prevents duplicate articles
- **Automated Scheduling**: Runs daily at 07:00 Europe/Oslo time via systemd timer
- **Production Ready**: Error handling, logging, resource limits, and monitoring
## Architecture
```
news-agent/
├── src/
│ ├── aggregator/ # RSS feed fetching
│ ├── ai/ # OpenRouter client, filtering, summarization
│ ├── storage/ # SQLite database operations
│ ├── email/ # Email generation and sending
│ └── main.py # Main orchestrator
├── config.yaml # Configuration
├── .env # Secrets (API keys)
└── systemd/ # Service and timer files
```
## Prerequisites
- **Fedora Linux** (or other systemd-based distribution)
- **Python 3.11+**
- **SMTP Server** (your own mail server or service like Gmail, Outlook, etc.)
- **OpenRouter API Key** (get from https://openrouter.ai)
## Installation
### 1. Clone/Copy Project
```bash
# Copy this project to your home directory
mkdir -p ~/news-agent
cd ~/news-agent
```
### 2. Install Python and Dependencies
```bash
# Install Python 3.11+ if not already installed
sudo dnf install python3.11 python3-pip
# Create virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e .
```
### 3. Configure News Agent
```bash
# Copy environment template
cp .env.example .env
# Edit .env and add your credentials
nano .env
```
**Required in `.env`:**
```bash
# OpenRouter API Key
OPENROUTER_API_KEY=sk-or-v1-...your-key-here...
# SMTP Credentials for your mail server
SMTP_USERNAME=your-email@yourdomain.com
SMTP_PASSWORD=your-smtp-password
```
**Edit `config.yaml`:**
```bash
nano config.yaml
```
Update the email section:
```yaml
email:
to: "your-email@example.com" # Where to receive the digest
from: "news-agent@yourdomain.com" # Sender address
smtp:
host: "mail.yourdomain.com" # Your mail server hostname
port: 587 # 587 for TLS, 465 for SSL
use_tls: true # true for port 587
use_ssl: false # true for port 465
```
**Common SMTP Settings:**
- **Your own server**: Use your mail server hostname and credentials
- **Gmail**: `smtp.gmail.com:587`, use App Password
- **Outlook/Office365**: `smtp.office365.com:587`
- **SendGrid**: `smtp.sendgrid.net:587`, use API key as password
Optionally adjust:
- AI model (default: `google/gemini-flash-1.5` - fast and cheap)
- Filtering threshold (default: 6.5/10)
- Max articles per digest (default: 15)
- RSS sources (add/remove feeds)
- Your interests for AI filtering
### 4. Test Run
```bash
# Activate virtual environment
source .venv/bin/activate
# Run manually to test
python -m src.main
```
Check:
- Console output for progress
- Logs in `data/logs/news-agent.log`
- Your email inbox for the digest
### 5. Set Up Systemd Timer
```bash
# Copy systemd files to user systemd directory
mkdir -p ~/.config/systemd/user
cp systemd/news-agent.service ~/.config/systemd/user/
cp systemd/news-agent.timer ~/.config/systemd/user/
# Edit service file to update paths if needed
nano ~/.config/systemd/user/news-agent.service
# Reload systemd
systemctl --user daemon-reload
# Enable and start timer
systemctl --user enable news-agent.timer
systemctl --user start news-agent.timer
# Check timer status
systemctl --user list-timers
systemctl --user status news-agent.timer
```
**Enable lingering** (allows user services to run when not logged in):
```bash
sudo loginctl enable-linger $USER
```
## Usage
### Manual Run
```bash
cd ~/news-agent
source .venv/bin/activate
python -m src.main
```
### Check Status
```bash
# Check timer status
systemctl --user status news-agent.timer
# View logs
journalctl --user -u news-agent.service -f
# Or check log file
tail -f data/logs/news-agent.log
```
### Trigger Manually
```bash
# Run service immediately (without waiting for timer)
systemctl --user start news-agent.service
```
### View Last Run
```bash
systemctl --user status news-agent.service
```
## Configuration
### RSS Sources
Add or remove sources in `config.yaml`:
```yaml
sources:
rss:
- name: "Your Source"
url: "https://example.com/feed.xml"
category: "tech" # tech, development, selfhosting, architecture, gadgets
```
### AI Configuration
**Models** (from cheap to expensive):
- `google/gemini-flash-1.5` - Fast, cheap, good quality (recommended)
- `meta-llama/llama-3.1-8b-instruct` - Very cheap
- `anthropic/claude-3.5-haiku` - Better quality, slightly more expensive
- `openai/gpt-4o-mini` - Good quality, moderate price
**Filtering:**
```yaml
ai:
filtering:
enabled: true
min_score: 6.5 # Articles below this score are filtered out
max_articles: 15 # Maximum articles in daily digest
```
**Interests:**
```yaml
ai:
interests:
- "Your interest here"
- "Another topic"
```
### Schedule
Change time in `~/.config/systemd/user/news-agent.timer`:
```ini
[Timer]
OnCalendar=07:00 # 24-hour format
```
Then reload:
```bash
systemctl --user daemon-reload
systemctl --user restart news-agent.timer
```
## Troubleshooting
### No Email Received
1. **Check logs:**
```bash
journalctl --user -u news-agent.service -n 50
```
2. **Check Postfix:**
```bash
sudo systemctl status postfix
sudo tail -f /var/log/maillog
```
3. **Test email manually:**
```bash
echo "Test email" | mail -s "Test" your-email@example.com
```
### API Errors
1. **Verify API key in `.env`**
2. **Check OpenRouter credit balance:** https://openrouter.ai/credits
3. **Check rate limits in logs**
### Service Not Running
```bash
# Check service status
systemctl --user status news-agent.service
# Check timer status
systemctl --user status news-agent.timer
# View detailed logs
journalctl --user -xe -u news-agent.service
```
### Database Issues
```bash
# Reset database (WARNING: deletes all history)
rm data/articles.db
python -m src.main
```
## Cost Estimation
Using `google/gemini-flash-1.5` (recommended):
- **Daily:** ~$0.05-0.15 (varies by article count)
- **Monthly:** ~$1.50-4.50
- **Yearly:** ~$18-54
Factors affecting cost:
- Number of new articles
- Content length
- Filtering threshold (lower = more articles = higher cost)
## Maintenance
### Update Dependencies
```bash
cd ~/news-agent
source .venv/bin/activate
pip install --upgrade -e .
```
### View Statistics
```bash
# Check database
sqlite3 data/articles.db "SELECT COUNT(*) FROM articles;"
sqlite3 data/articles.db "SELECT category, COUNT(*) FROM articles GROUP BY category;"
```
### Logs Rotation
Logs automatically rotate at 10MB with 5 backups (configured in `config.yaml`).
## Advanced Features
### Add API News Sources
Extend `src/aggregator/api_fetcher.py` to support NewsAPI, Google News API, etc.
### Customize Email Template
Edit `src/email/templates/daily_digest.html` for different styling.
### Web Dashboard
Add Flask/FastAPI to create a web interface for viewing past digests.
## Contributing
This is a personal project template. Feel free to fork and customize to your needs.
## License
MIT License - Free to use and modify
## Support
For issues with:
- **OpenRouter:** https://openrouter.ai/docs
- **Postfix:** Fedora documentation
- **This code:** Check logs and configuration
## Credits
Built with:
- Python 3.11+
- OpenRouter AI (https://openrouter.ai)
- Feedparser, Jinja2, Pydantic, and other open-source libraries