Add Anthropic prompt caching (direct + via OpenRouter)

Caches the system prompt/tools and growing conversation history via
cache_control breakpoints, cutting cost and latency on repeated turns.
Covers both the regular chat path and the tool-calling loop
(chatWithToolMessages), which has its own request-building code and was
initially missed. Cost calculation now accounts for cache write/read
pricing instead of treating all input tokens as full price. Verified
live: cache reads grow turn-over-turn in oAI.log.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-06-18 12:43:32 +02:00
parent a793fdacc4
commit 5b99a6f81c
5 changed files with 131 additions and 21 deletions
+20 -2
View File
@@ -48,7 +48,12 @@ struct OpenRouterChatRequest: Codable {
let toolChoice: String?
let modalities: [String]?
let reasoning: ReasoningAPIConfig?
let cacheControl: CacheControl?
struct CacheControl: Codable {
let type: String
}
struct APIMessage: Codable {
let role: String
let content: MessageContent
@@ -138,6 +143,7 @@ struct OpenRouterChatRequest: Codable {
case toolChoice = "tool_choice"
case modalities
case reasoning
case cacheControl = "cache_control"
}
}
@@ -225,11 +231,23 @@ struct OpenRouterChatResponse: Codable {
let promptTokens: Int
let completionTokens: Int
let totalTokens: Int
let promptTokensDetails: PromptTokensDetails?
struct PromptTokensDetails: Codable {
let cachedTokens: Int?
let cacheWriteTokens: Int?
enum CodingKeys: String, CodingKey {
case cachedTokens = "cached_tokens"
case cacheWriteTokens = "cache_write_tokens"
}
}
enum CodingKeys: String, CodingKey {
case promptTokens = "prompt_tokens"
case completionTokens = "completion_tokens"
case totalTokens = "total_tokens"
case promptTokensDetails = "prompt_tokens_details"
}
}
}