Transparent context reduction for LLMs

Your AI sends 38,000 tokens.
Sieve delivers the same answer with

One URL change. Zero agent modification.
Works with any LLM β€” local or cloud.
Open source Β· Free forever Β· Apache 2.0

Patent pending Β· GB2608859.1 Β· Apache 2.0

The problem with every LLM agent

Most agent frameworks rebuild the entire context on every request β€” system prompts, tool schemas, conversation history β€” regardless of what you asked.

Some agents truncate aggressively, losing important information. Others send everything, drowning the model in irrelevant context. Neither approach scales.

Sieve replaces both with intelligent retrieval β€” sending only what matters, without losing what's important.

The fix is architectural

Without Sieve
Agent tokens Tens of thousands of tokens LLM
With Sieve
Agent Sieve strips Β· retrieves tokens Hundreds of tokens LLM

Sieve sits transparently between your agent and your LLM. Instead of truncating or bloating, it retrieves only the relevant context from a structured memory store β€” delivering a lean, precise payload every time.

The numbers

Validated across hundreds of queries over 30+ simulated days with cross-family grading

0%
up to
Fewer tokens sent to the model on every request
0Γ—
up to
Less hallucination than unmodified baseline
0Γ—
up to
Faster inference β€” fewer tokens in, faster responses out
0
exactly
External dependencies beyond your own LLM

Gets smarter the more it knows you

Baseline models degrade as conversation grows. Sieve improves.

Context tokens over 60 days
BASELINE β€” tokens per request 150K 100K 50K 0 147K ↑ 140Γ— fewer tokens ↓ SIEVE β€” tokens per request 2K 1K 0 1K Day 10 20 30 40 50 60
Hallucination rate over 30 days
25% 15% 5% 0% 24.2% 2.5% Baseline: climbing Sieve: flat Baseline Sieve Days 1-10 Days 11-20 Days 21-30 By Day 30, baseline hallucinates 9.3Γ— more than Sieve

Don't trust our numbers? Run sieve benchmark on your own machine.

How Sieve fits

Excellent projects solving different facets of the same problem. Here's where Sieve fits.

ApproachWhat it does wellIntegrationSieve adds
Raw agent contextSimple, no setup neededN/AReduces bloat without changing the agent
Agent + compactionKeeps context manageableBuilt-inMore precise retrieval vs crude truncation
RAG systemsDocument retrieval at scaleRequires SDK integrationTransparent proxy β€” no code changes
Virtual context managersSophisticated memory managementRequires SDK changesDrop-in proxy, works alongside
SieveToken reduction + structured memoryTransparent proxyβ€”

Sieve is complementary. It works alongside any of these approaches β€” reducing what gets sent to the model regardless of how the context was assembled.

Watch it learn

Sieve's context reduction improves with every conversation

sieve demo
Message 1cold start
"Hi, I'm Alex. I live in London and work at a tech company."
πŸ“ Learned: name, location, employer
Message 5~90% reduction
"What's the weather like where I live?"
🧠 Retrieved: location from memory β€” thousands of tokens stripped
Message 20~97% reduction
"Should I refinance given my situation?"
🧠 Retrieved: 5 relevant financial facts from memory
Message 50~99% reduction
"What's my current half-marathon pace?"
🧠 Temporal supersession: serves only the latest value, not stale data
Message 55πŸ›‘οΈ trap detected
"What time is my son Jake's swimming lesson?"
πŸ›‘οΈ Absence signal: "No son named Jake on record" β€” hallucination prevented
Knowledge graph
nodes: 0edges: 0facts: 0
lives_in works_at spouse daughter son owes trains no_son βœ— Alex London employer Sophie weather Lily Oscar mortgage property rate running 1:52 half-mara Jake βœ—

Get started in 60 seconds

terminal
$ pip install llm-sieve
$ sieve init
β†’ Embedding model downloaded
β†’ Store initialised
β†’ Ready! Point your agent to localhost:11435
$ sieve start
 
# That's it. Change one URL in your agent config.

Works with everything

LLM providers
Ollama
OpenAI
Anthropic
vLLM
LM Studio
Any OpenAI-compat
Agents & frameworks
Any OpenAI-compatible agent
Any Ollama-compatible agent
Custom integrations
Hardware
Consumer GPUs (12 GB+)
Apple Silicon
Cloud GPUs
CPU-only + cloud LLM

For cloud LLM users

Fewer tokens per request = lower API costs. Plus memory and anti-hallucination that cloud APIs don't provide.

$485
estimated monthly savings

Rigorously validated

β–“β–’β–‘β–ˆβ–“β–’ of queries
β–“β–’ simulated days
β–“β–’β–‘β–ˆβ–“β–’β–‘ test paths
β–“β–’β–‘ automated tests
β–“β–’β–‘β–ˆ errors
Patent pending Β· GB2608859.1
Apache 2.0
Python 3.11+
Self-contained

Open source. Free forever.

Sieve is released under the Apache 2.0 licence. No hidden costs, no usage limits, no telemetry, no data collection. Your memory store stays on your machine β€” encrypted, private, and entirely under your control.

βœ“ Free to use, modify, and distribute
βœ“ No account required
βœ“ No telemetry or analytics
βœ“ Your data never leaves your machine
βœ“ Full source code on GitHub