Tensormesh Optimises AI Inference

Tensormesh Optimises AI Inference

23 October 2025

Tensormesh has emerged from stealth mode with $4.5 million in seed funding to tackle AI inference inefficiencies. The company's technology leverages an expanded form of Key Value (KV) caching to optimise AI inference. This approach aims to reduce both latency and GPU costs by up to tenfold, while ensuring enterprises maintain full control over their data and infrastructure.

At the core of Tensormesh's innovation is the preservation and reuse of the KV cache, a memory architecture that streamlines the processing of complex inputs. Unlike conventional systems that discard the KV cache after each query, Tensormesh retains it for subsequent similar tasks. This method can involve distributing data across multiple storage solutions, but it yields a substantial boost in inference capacity without increasing server demands. The company is commercialising the open-source LMCache tool, which was created by one of its co-founders.

Tensormesh intends to address the increasing demand for AI infrastructure and the pressure on organisations to maximise the inference output from their GPUs. The technology is particularly impactful for chat-based interfaces and systems that continually reference growing logs of conversations or actions. By retaining the cache, it becomes available for reuse when the model encounters a similar task in a future query.

AI generated content may differ from the original.

Published on 23 October 2025
aiinferencecachinggpuoptimisation
  • Clarifai Boosts AI Inference

    Clarifai Boosts AI Inference

    Read more about Clarifai Boosts AI Inference
  • GPU Pricing Signals AI Health

    GPU Pricing Signals AI Health

    Read more about GPU Pricing Signals AI Health
  • AMD Powers OpenAI's AI

    AMD Powers OpenAI's AI

    Read more about AMD Powers OpenAI's AI
  • Challenger Emerges to CUDA

    Challenger Emerges to CUDA

    Read more about Challenger Emerges to CUDA