Static embeddings for agent memory
Static embedding models make always-on memory practical because retrieval can be fast, local, and cheap enough to run constantly.
The homepage should not need to explain embedding architecture. The product value is instant memory. The technical reason it can feel instant is that static embeddings are very cheap to run.
Hugging Face's January 2025 article Train 400x faster Static Embedding Models with Sentence Transformers describes static models that avoid transformer attention at inference time. The released English retrieval model is sentence-transformers/static-retrieval-mrl-en-v1, a 1024-dimensional cosine-similarity retrieval model.
Why this fits memory
- Memory search happens constantly, so per-query and per-import overhead matters.
- Markdown chunks are short enough that a fast retrieval model is usually a better product tradeoff than a heavyweight encoder.
- Local embeddings remove external embedding API latency and make self-hosting realistic.
The ModernBERT experiment
The lee101/public-static-modern-bert repo is useful because it documents an abandoned distillation direction. The takeaway is not that every static model experiment works; it is that the practical path is to use the proven static retrieval model and build a solid memory product around it.
Back to blog