It was virtually impossible to have missed the DeepSeek release and the tidal wave of analysis and coverage that it has subsequently created over the last week or two. If that's you though, then here’s a great primer to get you up to speed.
Rather than digging into the architecture of how it changes the paradigm for LLMs, in this post I want to home in on what it means for those of us involved in designing, building, and running enterprise IT infrastructure for AI inferencing, and indeed the technology upon which that is based.
Why is DeepSeek so Significant?
There are really two key reasons why everyone is so excited about DeepSeek:
It’s Efficient
Instead of driving every Professor in the University to Campus and assembling them in a single lecture theater each day – just so that you can pose a question that probably only one of them can answer, DeepSeek routes your question to the most likely professor directly, while the rest get on with their day. It may not be perfect, but the tradeoff is more than good enough for many, many use cases.
That efficiency means that new models will be dramatically easier to train, requiring significantly less compute to achieve that. This efficiency also flows through to inferencing, bringing avast new array of applications for AI into the domain of commercial reality.
It’s Open Source
Yes, there are security and privacy concerns that it comes from China (and rightly so), but the architecture, code, and concepts are laid out for every engineer on the planet to inspect, review, edit and, maybe most importantly for the longer term, to be inspired by.
DeepSeek was just the starting line. We are already witnessing it spawn a whole new generation of models as efficient, or maybe even more efficient than the original, further propelling the rapid and wide scale integration of AI into our daily lives.
What Does that Mean for the Industry and for AI Inferencing Infrastructure?
In short, we as an industry need to re-evaluate almost everything we thought we knew about where infrastructure was headed and at what rate.
Current course and speed has been all about creating every larger, more powerful, and more power hungry (to the point of being unsustainable for many organizations) infrastructure – from chip level, all the way up to projects like the recently announced $500B Project Stargate. The sheer scale of the investment required would inevitably have placed a large amount of control into the hands of a very small number of corporations and organizations able to achieve that.
Suddenly AI is achievable for vast swathes of new applications. It can be everywhere, for everyone.
Openness and Diversity
That lower barrier to entry will spawn a much richer, more diverse technology ecosystem. The race to create the most powerful GPU will now only be one of the races being run. We should expect and plan for building infrastructure and deployment options flexible enough to accommodate silicon diversity –and not just multiple GPUs from multiple vendors. This rapid wide scale deployment we almost certainly make both LLMs optimized for specific applications and dedicated hardware accelerators commercially viable. In fact, we may even see models developed that do not require hardware accelerators (GPU or otherwise) at all.
Infrastructure Everywhere
The ability to run inference using low-cost GPUs and hardware will see AI move to mainstream adoption way faster than previously envisaged. We already were predicting that 2025 would be the year we saw on-premises inferencing transition from pilot projects and PoCs to production scale roll outs. But before, use cases were limited to just a few compute-intense industries. That transition will now happen much faster and much broadly than we imagined just a few short weeks ago.
We need to plan for AI infrastructure that’s efficient, cost-effective, and flexible enough to deploy anywhere – whether in the cloud or on prem, at the edge, in a co-lo facility – whatever works for you or your customer. And that mainstream adoption and much lower barrier to entry will enable CSPs and MSPs at all tiers of the market, as well as totally new entrants to bring to market a much broader range of service offerings beyond the “NVIDIA 123 GPU for $xx/hour” type offers we see today.
Training vs. Inference
In this much more dynamic world, we should also be planning for quick changes in the mix of infrastructure between LLM training, which we expect to continue to happen at the hyperscaler level, to much lower cost, lower power interference applications, tuning, RAG, and likely small language models (SLMs)occurring in the datacenter and at the edge. This is likely to happen now much faster than server refresh cycles, so splitting accelerators out from within them makes even more commercial sense.
What does all this Mean for Liqid and Software-Defined Composable Infrastructure?
If Software-Defined Composable Infrastructure was a key enabler of AI inferencing infrastructure before, it becomes almost indispensable now and in the future.
In the End, Flexibility Will Win
Today, it’s assumed that AI inferencing requires GPUs. And while it’s likely that some, surely many, models will continue to require the computational power of GPUs for inferencing, other models may not require GPU sat all. And what if a yet-to-be-released model doesn’t require GPUs to be successful with inferencing but depends more on a variety of other hardware resources.
Liqid has the unique ability to dynamically support environments where different types of compute resources are needed - GPUs for certain heavy inferencing tasks and CPUs or FPGAs for lighter tasks or for the edge, and then orchestrate those resources on the fly, allowing enterprises to use the right resources for the right task more efficiently.
In this new world of uncertainty where what infrastructure will be needed, where and when is largely unknown, the unique flexibility software-defined composable infrastructure offers wins, hands-down.
Fast Will Continue to Get Faster
Some estimates say that new models are being released every 2.5 days. And while hardly any of them have generated the buzz that DeepSeek, it’s very likely that DeepSeek won’t be the last ‘viral’ AI disruptor.
With Liqid’s focus on AI inferencing infrastructure though, the model won’t matter. We can help organizations adapt to new models, especially when it comes to handling the dynamic resource needs of AI models. Our approach delivers unique agility in adopting the latest versions of AI models while delivering operational efficiency and cost control. We have data that shows we can support inferencing for even the largest models. And we will only be more equipped to deliver more robust inferencing capabilities when our composable memory solutions are available.
And, in fact, if DeepSeek and its successors prove to dramatically reduce the barrier to entry for model building, we can certainly support small language model development, on-premises tuning, or sovereign model development specific to an organization, keeping their proprietary data on prem.
Speed and Flexibility Create an Even Deeper Divide for the Status Quo
Traditional server infrastructure just can’t provide the composable flexibility and silicon diversity to satisfy what could be a wide range of requirements that could change on an almost daily basis. The status quo can’t keep up with the pace of change.
We uniquely provide the flexibility, scalability, and cost-effectiveness needed to support initiatives from small language models to rapidly evolving AI inferencing initiatives. In fact, our flexibility enables organizations to experiment with new models as they are released.
Mushrooming growth in commercial adoption of AI, diversity and openness of new technologies and services, the ability and requirement to deploy almost anywhere will all favor the flexible, efficient approach that Liqid provides.
We were excited for what 2025 would bring before. We just can’t wait now.