Benchmarks and methodology
Transparent documentation of how we measure performance and cost savings. If you see a number anywhere on lexico.no, this page describes the methodology behind it.
STONE compression
Semantic token reduction for LLM requests
The claim: up to 79% reduction
The STONE compression engine can reduce the number of tokens sent to the AI model by up to 79% on optimal workloads — without noticeable loss in response quality. The average on mixed B2B workloads is 45-65%.
How we measure
- Token count before compression (OpenAI tiktoken / Anthropic tokenizer)
- Token count after STONE processing on the same prompt
- Semantic comparison of output vs baseline (cosine similarity on embeddings)
- Blind test where two AI responses (compressed + uncompressed) are evaluated by GPT-4 as judge
Important caveats
- Actual compression depends on prompt pattern. Repetitive workloads compress best; creative text least.
- The 79% figure is observed peak on structured B2B workloads, not guaranteed average.
- The ROI calculator on /produkter uses a conservative 60% as default estimate.
- Measurements are performed by LexiCo — independent third-party validation is in progress with Simula Research Laboratory.
O(1) response time
Constant-time AI proxy regardless of context length
The claim: constant response time
Our proxy architecture delivers response time that is nearly independent of input size within a typical context window. Where traditional solutions scale linearly or quadratically with token count, our proxy maintains a near-flat curve.
How we measure
- Latency measured from API request to first token received (TTFT)
- Test set with context from 100 to 100,000 tokens
- P50/P95/P99 percentiles over 1,000 requests per size
- Comparison against direct calls to underlying models (OpenAI, Anthropic)
Important caveats
- Constant time applies to the proxy layer, not the underlying model (which still has its own latency).
- Extremely large contexts beyond the model window return an error, not a slow response.
- Network latency to the client is not counted in the measurement.
Third-party validation
LexiCo has been in dialogue with academic institutions for independent validation of the core technology:
- Simula Research Laboratory — initial dialogue about validating O(1) architecture and STONE compression. In progress 2026.
- NTNU — previous review of the architecture description. Full report not published.
We are committed to transparency. When third-party reports are ready, they will be published here with links to full text.
Want to test for yourself?
Contact us for test access to LexiSaaS with your own workloads. You will get real numbers based on your actual usage.