Add Anthropic prompt caching (direct + via OpenRouter)
Caches the system prompt/tools and growing conversation history via cache_control breakpoints, cutting cost and latency on repeated turns. Covers both the regular chat path and the tool-calling loop (chatWithToolMessages), which has its own request-building code and was initially missed. Cost calculation now accounts for cache write/read pricing instead of treating all input tokens as full price. Verified live: cache reads grow turn-over-turn in oAI.log. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -48,7 +48,12 @@ struct OpenRouterChatRequest: Codable {
|
||||
let toolChoice: String?
|
||||
let modalities: [String]?
|
||||
let reasoning: ReasoningAPIConfig?
|
||||
|
||||
let cacheControl: CacheControl?
|
||||
|
||||
struct CacheControl: Codable {
|
||||
let type: String
|
||||
}
|
||||
|
||||
struct APIMessage: Codable {
|
||||
let role: String
|
||||
let content: MessageContent
|
||||
@@ -138,6 +143,7 @@ struct OpenRouterChatRequest: Codable {
|
||||
case toolChoice = "tool_choice"
|
||||
case modalities
|
||||
case reasoning
|
||||
case cacheControl = "cache_control"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -225,11 +231,23 @@ struct OpenRouterChatResponse: Codable {
|
||||
let promptTokens: Int
|
||||
let completionTokens: Int
|
||||
let totalTokens: Int
|
||||
|
||||
let promptTokensDetails: PromptTokensDetails?
|
||||
|
||||
struct PromptTokensDetails: Codable {
|
||||
let cachedTokens: Int?
|
||||
let cacheWriteTokens: Int?
|
||||
|
||||
enum CodingKeys: String, CodingKey {
|
||||
case cachedTokens = "cached_tokens"
|
||||
case cacheWriteTokens = "cache_write_tokens"
|
||||
}
|
||||
}
|
||||
|
||||
enum CodingKeys: String, CodingKey {
|
||||
case promptTokens = "prompt_tokens"
|
||||
case completionTokens = "completion_tokens"
|
||||
case totalTokens = "total_tokens"
|
||||
case promptTokensDetails = "prompt_tokens_details"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user