Transparent context reduction for LLMs

Your AI sends 38,000 tokens.
Sieve delivers the same answer with

One URL change. Zero agent modification.
Works with any LLM — local or cloud.
Open source · Free forever · Apache 2.0

Patent pending · GB2608859.1 · Apache 2.0

The problem with every LLM agent

Most agent frameworks rebuild the entire context on every request — system prompts, tool schemas, conversation history — regardless of what you asked.

Some agents truncate aggressively, losing important information. Others send everything, drowning the model in irrelevant context. Neither approach scales.

Sieve replaces both with intelligent retrieval — sending only what matters, without losing what's important.

The fix is architectural

Without Sieve
Agent tokens Tens of thousands of tokens LLM
With Sieve
Agent Sieve strips · retrieves tokens Hundreds of tokens LLM

Sieve sits transparently between your agent and your LLM. Instead of truncating or bloating, it retrieves only the relevant context from a structured memory store — delivering a lean, precise payload every time.

Measured. Replicated. Honest.

What Sieve actually does

Four numbers. Cross-checked across independent graders.

Token reduction
0%
Every turn. Every model.
Faster followups
3–7× on frontier models
Recall latency
<0ms
At 100,000 facts. Encrypted.
External dependencies
0
Bring your own LLM. Period.
5 LLM architectures  ·  8B–72B scales  ·  8K–64K windows  ·  1–64 concurrency

Gets smarter the more it knows you

Baseline models degrade as conversation grows. Sieve improves.

Context tokens over 60 days
BASELINE — tokens per request 150K 100K 50K 0 147K ↑ 140× fewer tokens ↓ SIEVE — tokens per request 2K 1K 0 1K Day 10 20 30 40 50 60
Hallucination rate over 30 days
25% 15% 5% 0% 24.2% 2.5% Baseline: climbing Sieve: flat Baseline Sieve Days 1-10 Days 11-20 Days 21-30 Up to 9× less hallucination on absence-trap queries — measured across the validation runs

Don't trust our numbers? Run sieve benchmark on your own machine.

How Sieve fits

Excellent projects solving different facets of the same problem. Here's where Sieve fits.

ApproachWhat it does wellIntegrationSieve adds
Raw agent contextSimple, no setup neededN/AReduces bloat without changing the agent
Agent + compactionKeeps context manageableBuilt-inMore precise retrieval vs crude truncation
RAG systemsDocument retrieval at scaleRequires SDK integrationTransparent proxy — no code changes
Virtual context managersSophisticated memory managementRequires SDK changesDrop-in proxy, works alongside
SieveToken reduction + structured memoryTransparent proxy

Sieve is complementary. It works alongside any of these approaches — reducing what gets sent to the model regardless of how the context was assembled.

Watch it learn

Sieve's context reduction improves with every conversation

sieve demo
Message 1cold start
"Hi, I'm Alex. I live in London and work at a tech company."
📝 Learned: name, location, employer
Message 5~90% reduction
"What's the weather like where I live?"
🧠 Retrieved: location from memory — thousands of tokens stripped
Message 20~95% reduction
"Should I refinance given my situation?"
🧠 Retrieved: 5 relevant financial facts from memory
Message 50~99% reduction
"What's my current half-marathon pace?"
🧠 Temporal supersession: serves only the latest value, not stale data
Message 55🛡️ trap detected
"What time is my son Jake's swimming lesson?"
🛡️ Absence signal: "No son named Jake on record" — hallucination prevented
Knowledge graph
nodes: 0edges: 0facts: 0
lives_in works_at spouse daughter son owes trains no_son ✗ Alex London employer Sophie weather Lily Oscar mortgage property rate running 1:52 half-mara Jake ✗

Get started in 60 seconds

terminal
$ pipx install llm-sieve
$ sieve-install
→ Provider detected: Ollama at 127.0.0.1:11434
→ Embedding model downloaded
→ Encrypted store initialised
→ Sieve is ready at http://127.0.0.1:11435
 
# That's it. Change one URL in your agent config.

Works with everything

LLM providers
Ollama
OpenAI
Anthropic
vLLM
LM Studio
Any OpenAI-compat
Agents & frameworks
Any OpenAI-compatible agent
Any Ollama-compatible agent
Custom integrations
Hardware
Consumer GPUs (12 GB+)
Apple Silicon
Cloud GPUs
CPU-only + cloud LLM

For cloud LLM users

Fewer tokens per request = lower API costs. Plus memory and anti-hallucination that cloud APIs don't provide.

$485
estimated monthly savings

Rigorously validated

▓▒░█▓▒ of queries
▓▒ simulated days
▓▒░█▓▒░ test paths
▓▒░ automated tests
▓▒░█ errors
Patent pending · GB2608859.1
Apache 2.0
Python 3.11+
Self-contained

Open source. Free forever.

Sieve is released under the Apache 2.0 licence. No hidden costs, no usage limits, no telemetry, no data collection. Your memory store stays on your machine — encrypted, private, and entirely under your control.

Free to use, modify, and distribute
No account required
No telemetry or analytics
Your data never leaves your machine
Full source code on GitHub