Transparent context reduction for LLMs

Your AI sends 38,000 tokens.
Sieve delivers the same answer with

One URL change. Zero agent modification.
Works with any LLM β€” local or cloud.
Open source Β· Free forever Β· Apache 2.0

Patent pending Β· GB2608859.1 Β· Apache 2.0

The problem with every LLM agent

Most agent frameworks rebuild the entire context on every request β€” system prompts, tool schemas, conversation history β€” regardless of what you asked.

Some agents truncate aggressively, losing important information. Others send everything, drowning the model in irrelevant context. Neither approach scales.

Sieve replaces both with intelligent retrieval β€” sending only what matters, without losing what's important.

The fix is architectural

Without Sieve
Agent tokens Tens of thousands of tokens LLM
With Sieve
Agent Sieve strips Β· retrieves tokens Hundreds of tokens LLM

Sieve sits transparently between your agent and your LLM. Instead of truncating or bloating, it retrieves only the relevant context from a structured memory store β€” delivering a lean, precise payload every time.

The numbers

Validated across hundreds of queries over 30 simulated days of conversation

0%
up to
Fewer tokens sent to the model on every request
0Γ—
up to
Reduction in hallucination vs unmodified baseline
0Γ—
up to
Faster inference β€” fewer tokens in, faster responses out
0
exactly
External dependencies beyond your own LLM

Watch it learn

Sieve's context reduction improves with every conversation

sieve demo
Message 1cold start
"Hi, I'm Alex. I live in London and work at a tech company."
πŸ“ Learned: name, location, employer
Message 5~90% reduction
"What's the weather like where I live?"
🧠 Retrieved: location from memory β€” thousands of tokens stripped
Message 20~97% reduction
"Should I refinance given my situation?"
🧠 Retrieved: 5 relevant financial facts from memory
Message 50~99% reduction
"What's my current half-marathon pace?"
🧠 Temporal supersession: serves only the latest value, not stale data
Message 55πŸ›‘οΈ trap detected
"What time is my son Jake's swimming lesson?"
πŸ›‘οΈ Absence signal: "No son named Jake on record" β€” hallucination prevented
Knowledge graph
nodes: 0edges: 0facts: 0
lives_in works_at spouse daughter son owes trains no_son βœ— Alex London employer Sophie weather Lily Oscar mortgage property rate running 1:52 half-mara Jake βœ—

Get started in 60 seconds

terminal
$ pip install llm-sieve
$ sieve init
β†’ Embedding model downloaded
β†’ Store initialised
β†’ Ready! Point your agent to localhost:11435
$ sieve start
 
# That's it. Change one URL in your agent config.

Works with everything

LLM providers
Ollama
OpenAI
Anthropic
vLLM
LM Studio
Any OpenAI-compat
Agents & frameworks
Any OpenAI-compatible agent
Any Ollama-compatible agent
Custom integrations
Hardware
Consumer GPUs (12 GB+)
Apple Silicon
Cloud GPUs
CPU-only + cloud LLM

For cloud LLM users

Fewer tokens per request = lower API costs. Plus memory and anti-hallucination that cloud APIs don't provide.

$442
estimated monthly savings

Rigorously validated

β–“β–’β–‘β–ˆβ–“β–’ of queries
β–“β–’ simulated days
β–“β–’β–‘β–ˆβ–“β–’β–‘ test paths
β–“β–’β–‘ automated tests
β–“β–’β–‘β–ˆ errors
Patent pending Β· GB2608859.1
Apache 2.0
Python 3.11+
Self-contained

Open source. Free forever.

Sieve is released under the Apache 2.0 licence. No hidden costs, no usage limits, no telemetry, no data collection. Your memory store stays on your machine β€” encrypted, private, and entirely under your control.

βœ“ Free to use, modify, and distribute
βœ“ No account required
βœ“ No telemetry or analytics
βœ“ Your data never leaves your machine
βœ“ Full source code on GitHub