Transparent context reduction for LLMs

Your AI sends 38,000 tokens.
Sieve delivers the same answer with

One URL change. Zero agent modification.
Works with any LLM — local or cloud.
Open source · Free forever · Apache 2.0

Patent pending · GB2608859.1 · Apache 2.0

The problem with every LLM agent

Most agent frameworks rebuild the entire context on every request — system prompts, tool schemas, conversation history — regardless of what you asked.

Some agents truncate aggressively, losing important information. Others send everything, drowning the model in irrelevant context. Neither approach scales.

Sieve replaces both with intelligent retrieval — sending only what matters, without losing what's important.

The fix is architectural

Without Sieve

With Sieve

Sieve sits transparently between your agent and your LLM. Instead of truncating or bloating, it retrieves only the relevant context from a structured memory store — delivering a lean, precise payload every time.

Measured. Replicated. Honest.

What Sieve actually does

Four numbers. Cross-checked across independent graders.

Token reduction

Every turn. Every model.

Faster followups

0×

3–7× on frontier models

Recall latency

<0ms

At 100,000 facts. Encrypted.

External dependencies

Bring your own LLM. Period.

▎ 5 LLM architectures · 8B–72B scales · 8K–64K windows · 1–64 concurrency

Gets smarter the more it knows you

Baseline models degrade as conversation grows. Sieve improves.

Context tokens over 60 days

Hallucination rate over 30 days

Don't trust our numbers? Run sieve benchmark on your own machine.

How Sieve fits

Excellent projects solving different facets of the same problem. Here's where Sieve fits.

Approach	What it does well	Integration	Sieve adds
Raw agent context	Simple, no setup needed	N/A	Reduces bloat without changing the agent
Agent + compaction	Keeps context manageable	Built-in	More precise retrieval vs crude truncation
RAG systems	Document retrieval at scale	Requires SDK integration	Transparent proxy — no code changes
Virtual context managers	Sophisticated memory management	Requires SDK changes	Drop-in proxy, works alongside
Sieve	Token reduction + structured memory	Transparent proxy	—

Sieve is complementary. It works alongside any of these approaches — reducing what gets sent to the model regardless of how the context was assembled.

Watch it learn

Sieve's context reduction improves with every conversation

sieve demo

Message 1cold start

"Hi, I'm Alex. I live in London and work at a tech company."

📝 Learned: name, location, employer

Message 5~90% reduction

"What's the weather like where I live?"

🧠 Retrieved: location from memory — thousands of tokens stripped

Message 20~95% reduction

"Should I refinance given my situation?"

🧠 Retrieved: 5 relevant financial facts from memory

Message 50~99% reduction

"What's my current half-marathon pace?"

🧠 Temporal supersession: serves only the latest value, not stale data

Message 55🛡️ trap detected

"What time is my son Jake's swimming lesson?"

🛡️ Absence signal: "No son named Jake on record" — hallucination prevented

Knowledge graph

nodes: 0edges: 0facts: 0

Get started in 60 seconds

terminal

$ pipx install llm-sieve

$ sieve-install

→ Provider detected: Ollama at 127.0.0.1:11434

→ Embedding model downloaded

→ Encrypted store initialised

→ Sieve is ready at http://127.0.0.1:11435

# That's it. Change one URL in your agent config.

Works with everything

LLM providers

Ollama

OpenAI

Anthropic

vLLM

LM Studio

Any OpenAI-compat

Agents & frameworks

Any OpenAI-compatible agent

Any Ollama-compatible agent

Custom integrations

Hardware

Consumer GPUs (12 GB+)

Apple Silicon

Cloud GPUs

CPU-only + cloud LLM

For cloud LLM users

Fewer tokens per request = lower API costs. Plus memory and anti-hallucination that cloud APIs don't provide.

Your monthly API spend ($)

$485

estimated monthly savings

Rigorously validated

▓▒░█▓▒ of queries

▓▒ simulated days

▓▒░█▓▒░ test paths

▓▒░ automated tests

▓▒░█ errors

Patent pending · GB2608859.1

Apache 2.0

Python 3.11+

Self-contained

Open source. Free forever.

Sieve is released under the Apache 2.0 licence. No hidden costs, no usage limits, no telemetry, no data collection. Your memory store stays on your machine — encrypted, private, and entirely under your control.

✓ Free to use, modify, and distribute

✓ No account required

✓ No telemetry or analytics

✓ Your data never leaves your machine

✓ Full source code on GitHub

Your AI sends 38,000 tokens. Sieve delivers the same answer with