Benchmarks and methodology

Transparent documentation of how we measure performance and cost savings. If you see a number anywhere on lexico.no, this page describes the methodology behind it.

STONE O(1) Proxy

O(1)

STONE

API

DATA

STONE compression

Semantic token reduction for LLM requests

The claim: up to 79% reduction

The STONE compression engine can reduce the number of tokens sent to the AI model by up to 79% on optimal workloads — without noticeable loss in response quality. The average on mixed B2B workloads is 45-65%.

How we measure

Token count before compression (OpenAI tiktoken / Anthropic tokenizer)
Token count after STONE processing on the same prompt
Semantic comparison of output vs baseline (cosine similarity on embeddings)
Blind test where two AI responses (compressed + uncompressed) are evaluated by GPT-4 as judge

Important caveats

Actual compression depends on prompt pattern. Repetitive workloads compress best; creative text least.
The 79% figure is observed peak on structured B2B workloads, not guaranteed average.
The ROI calculator on /produkter uses a conservative 60% as default estimate.
Measurements are performed by LexiCo — independent third-party validation is in progress with Simula Research Laboratory.

O(1) response time

Constant-time AI proxy regardless of context length

The claim: constant response time

Our proxy architecture delivers response time that is nearly independent of input size within a typical context window. Where traditional solutions scale linearly or quadratically with token count, our proxy maintains a near-flat curve.

How we measure

Latency measured from API request to first token received (TTFT)
Test set with context from 100 to 100,000 tokens
P50/P95/P99 percentiles over 1,000 requests per size
Comparison against direct calls to underlying models (OpenAI, Anthropic)

Important caveats

Constant time applies to the proxy layer, not the underlying model (which still has its own latency).
Extremely large contexts beyond the model window return an error, not a slow response.
Network latency to the client is not counted in the measurement.

Third-party validation

LexiCo has been in dialogue with academic institutions for independent validation of the core technology:

Simula Research Laboratory — initial dialogue about validating O(1) architecture and STONE compression. In progress 2026.
NTNU — previous review of the architecture description. Full report not published.

We are committed to transparency. When third-party reports are ready, they will be published here with links to full text.

Want to test for yourself?

Request test access