Files
NewsAgent/MULTILINGUAL.md
2026-01-27 09:46:02 +01:00

7.0 KiB

Multilingual Support

News Agent supports articles in multiple languages, including Norwegian, English, and others.

Configuration

1. Add Norwegian RSS Sources

In config.yaml:

sources:
  rss:
    - name: "NRK Nyheter"
      url: "https://www.nrk.no/toppsaker.rss"
      category: "tech"
    
    - name: "Digi.no"
      url: "https://www.digi.no/rss"
      category: "tech"
    
    - name: "Kode24"
      url: "https://www.kode24.no/rss"
      category: "development"

2. Add Multilingual Interests

In config.yaml under ai.interests:

ai:
  interests:
    - "Technology news from Norway (Norwegian articles welcome)"
    - "Norwegian tech industry and startups"
    - "General news from Norway in Norwegian"
    - "AI and machine learning developments"
    - "Self-hosting solutions"
    # ... other interests

Key points:

  • Be explicit: "Norwegian articles welcome" or "in Norwegian"
  • Mention the country/region for local news
  • Use natural language

3. Language Handling

The AI prompts now explicitly support multiple languages:

Filtering:

  • Articles evaluated regardless of language
  • Norwegian content given equal consideration
  • Interest matching works across languages

Summarization:

  • Summaries written in the SAME language as the source
  • Norwegian articles → Norwegian summaries
  • English articles → English summaries

Tips for Better Norwegian Content

1. Be Specific with Interests

Too vague:

- "News from Norway"

Better:

- "Norwegian technology news (accept Norwegian language)"
- "Politik og samfunn fra Norge"
- "Norsk tech-industri og oppstartselskaper"

2. Lower Filtering Threshold

Norwegian content might score slightly lower initially. Try:

ai:
  filtering:
    min_score: 5.0  # Lower threshold (was 5.5)

3. Use Norwegian News Sources

Tech/Development:

  • Digi.no: https://www.digi.no/rss
  • Kode24: https://www.kode24.no/rss
  • Tek.no: https://www.tek.no/rss

General News:

  • NRK: https://www.nrk.no/toppsaker.rss
  • VG: https://www.vg.no/rss/feed/
  • Aftenposten: https://www.aftenposten.no/rss

Business/Tech:

  • DN (Dagens Næringsliv): Check their RSS feeds
  • E24: https://e24.no/rss

4. Mixed Language Email

Your email will contain both English and Norwegian articles:

TECH
----
• Norwegian startup raises $10M (English summary)
• Norsk AI-selskap lanserer ny tjeneste (Norwegian summary)

DEVELOPMENT
-----------
• New Python framework released (English summary)
• Kode24: Sånn bruker du GitHub Copilot (Norwegian summary)

Troubleshooting

Not Getting Norwegian Articles?

1. Check if articles are being fetched:

# Enable debug logging
# In config.yaml:
logging:
  level: "DEBUG"

# Run and check
python -m src.main
grep -i "norwegian\|norge" data/logs/news-agent.log

2. Check filtering scores:

sqlite3 data/articles.db "SELECT title, relevance_score, source FROM articles WHERE source LIKE '%norsk%' OR source LIKE '%norweg%' ORDER BY fetched_at DESC LIMIT 10;"

3. Verify RSS feed works:

curl -s "https://www.digi.no/rss" | head -50

4. Manually check if feed has recent content: Visit the RSS URL in your browser

Norwegian Articles Scoring Low?

Possible reasons:

  1. Interest not specific enough

    • Add: "Norwegian technology and business news in Norwegian language"
  2. Threshold too high

    • Lower to 4.5 or 5.0
  3. Content too general

    • Norwegian general news might not match "tech" interests
    • Add specific Norwegian interests
  4. Article content is short

    • Some RSS feeds only include headlines
    • AI can't judge relevance from title alone

Mixed Results?

If you're getting English but not Norwegian:

  1. Check the interest phrasing:

    # Add to top of interests list:
    interests:
      - "Norwegian news and technology (Norwegian language accepted)"
      - "Norge: teknologi, samfunn, og næringsliv"
    
  2. Use a more permissive model:

    ai:
      model: "anthropic/claude-3.5-haiku"  # Better with multiple languages
    
  3. Test with debug mode:

    # Enable debug logging and run
    python -m src.main 2>&1 | grep -A 3 -B 3 "Norwegian\|Norge"
    

Example Configuration

Complete example supporting both English and Norwegian:

sources:
  rss:
    # English sources
    - name: "Hacker News"
      url: "https://news.ycombinator.com/rss"
      category: "tech"
    
    - name: "TechCrunch"
      url: "https://techcrunch.com/feed/"
      category: "tech"
    
    # Norwegian sources
    - name: "Digi.no"
      url: "https://www.digi.no/rss"
      category: "tech"
    
    - name: "Kode24"
      url: "https://www.kode24.no/rss"
      category: "development"
    
    - name: "NRK Nyheter"
      url: "https://www.nrk.no/toppsaker.rss"
      category: "tech"

ai:
  model: "openai/gpt-4o-mini"
  
  filtering:
    enabled: true
    min_score: 5.0  # Slightly lower for Norwegian content
    max_articles: 20  # More articles to ensure Norwegian included
    
  interests:
    # Norwegian-specific
    - "Norwegian technology news and developments (Norwegian language)"
    - "Norsk tech-industri, oppstartselskaper, og innovasjon"
    - "General news from Norway or about Norway"
    
    # General (works for both languages)
    - "AI and machine learning developments"
    - "Open source projects and tools"
    - "Self-hosting solutions"
    - "Python and software development"

Language Statistics

After running, check language distribution:

sqlite3 data/articles.db "
SELECT 
    CASE 
        WHEN source LIKE '%norsk%' OR source LIKE '%digi%' OR source LIKE '%kode%' THEN 'Norwegian'
        ELSE 'English'
    END as language,
    COUNT(*) as count,
    AVG(relevance_score) as avg_score
FROM articles 
WHERE processed = 1 
GROUP BY language;
"

Best Practices

  1. Be explicit - Tell AI that Norwegian is welcome
  2. Lower threshold - 5.0 instead of 5.5 or 6.5
  3. More articles - Increase max_articles to 20
  4. Specific interests - Mention Norwegian topics explicitly
  5. Good sources - Use active Norwegian tech/news RSS feeds
  6. Test first - Run manually with debug logging

Models and Multilingual Support

All modern models support multiple languages well:

Model Norwegian Support Recommendation
openai/gpt-4o-mini Excellent Recommended
anthropic/claude-3.5-haiku Excellent Best for multilingual
google/gemini-2.0-flash-exp:free Good ⚠️ Has rate limits

Claude models are particularly good with Scandinavian languages.

Summary

To get Norwegian articles in your digest:

  1. Add Norwegian RSS sources
  2. Add explicit Norwegian interests ("Norwegian language accepted")
  3. Lower filtering threshold to 5.0
  4. Updated prompts (already done!)
  5. Test with: python -m src.main

The summaries will automatically be in the same language as the source article!