1 comments

  • buildoak 1 hour ago
    TL;DR. Go proxy + skill that allows claude to profile current session and surgically optimize it’s own context by. (1) Evicting old / irrelevant file reads. (2) Deterministically compressing bash tools results (calibrated on SWE Bench). (3) Using subagents to rewrite File Reads, Subagent Returns (Task / Agents), Glob, etc.

    Resulting in ~40-60k tokens eviction at 150-200k session context (claude opus 4.6). Sessions live linger and can go 300+ tools calls. Also cheaper. Cheaper even with the cache miss (for tokens). Works with both API and subscription. Fully auditable and traceable (local data structure). Doesn’t break telemetry and hooks logic (pre tool use hook block dangerous logic works!).

    Now full story: Autocompact sucks. Hundreds of context optimization tools has too much edge cases. No one tried to ask claude to profile and optimize its own context yet. I’ve tried it - and it works like charm.

    I’ve been once running gaussian moat Rust solvers on my mac mini inside claude code’s native subagents. But then after big agent return - session autocompleted and claude restarted subagents swarm - and well, they have cooked my swap to 60 Gb, and then I’ve got cut off from my headless mac mini kun because of firewall race condition, and it sucked. Other observation I had was that claude code main session slips sometimes and runs commands, reads files, gets big agents returns - and context jumps like +10k or +30k tokens immediately. I wanted to fix it. I wanted to be able to evict some wrong tool results or file reads.

    I’ve tried countless context optimization tools in order to fix it. But none suited me case. Good one is - RTK for example - optimizes tool result output. Tested it - but the problem was that since it’s rewrites original bash command - block dangerous commands pre tool use hook stops working - not ideal. Also RTK can not wringle file reads or agent returns - in my case they were the major culprits.

    I’ve started thinking - how to intercept tool result before claude sees it and optimize it? After some back and forth and more research I have concluded to very elegant idea, slightly different from initial one: (1) build the shim - proxy layer between claude code and OS. (2) Compress tool results AFTER they land in the context - e.g. compress bash calls 50 tool calls back (3) Give claude toolbox + manual for cleaning up its own context - with proper profiling and safeguards. Deterministic compression for Bash. Sonnet 4.6 or Opus 4.6 subagent rewrite for Reads, subagents and some other tools. works like charm.

    This is how I’ve ended up building “wet claude”. Go proxy that is called “wet” and launches like “wet claude” - it spins a proxy layer that sees every API call and comes with a local data structure. Every tool call gets its own id and tokens estimate. So claude can see how much tokens is each of the tool result - profile whole session and propose what could be optimized. Local data structure gives audit-ability and traceability - so if anything goes weird - there is always a way to debug it.

    Since Anthropic counts tokens based on what their API sees - by optimizing at request level “native token counter” also goes down - and autocompact is not happening. Nice side bonus - requests become cheaper and limits last for longer. Works with both subscription and API.

    Comes with a skill that tells claude how to operate it - profile first - suggest options - discuss and then optimize. As per my testing I’ve been able to evict up to 40-60k tokens of context on a sessions going to ~ 150-200k context (1M opus 4.6), and reliably do it for several times without any quality drop.

    As a side bonus it could make claude to “forget” files he read by accident - by evicting them from his memory. And it also seems, that by being meta aware of the game - claude deals with compressed context much better.

    Haven’t yet seen any tool that would operate with the same logic (prove me if I wrong though).