DeepSeek’s Multi-Head Latent Attention Method

By Jasmin (Bey) Cowin, Ed.D. Associate Professor, Touro University

January 28, 2025

Who Is Behind DeepSeek?

DeepSeek is CEO Liang Wenfeng’s AI company, which was featured in headlines for releasing an open-source AI model capable of outperforming well-known models by organizations such as OpenAI. DeepSeek’s model requires fewer computing resources (often referred to as “chips,” like GPUs or specialized AI processors) and is developed by a team with substantially fewer years of collective experience than the more established AI labs.

DeepSeek’s Pioneering Approach

DeepSeek’s success primarily stems from its pioneering approach to model architecture. The company introduced a novel MLA (multi-head latent attention) method that lowers memory usage to just 5–13% of what the more common MHA architecture consumes. They also devised the DeepSeekMoESparse structure, which effectively minimizes computational requirements and, in turn, further drives down overall costs. The DeepSeek V3 model, containing 671 billion parameters, was reportedly developed with remarkable cost efficiency at US$5.58 million over approximately two months. Andrej Karpathy@karpathy,  a highly regarded computer scientist, former Director of AI at Tesla, and founder of Eureka Labs, a new AI+Education company, wrote in X;

“DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M).” 

Andrej Karpathy

This achievement stands out particularly when compared to the more resource-intensive development processes of larger tech companies like Meta and OpenAI.

Large Language Models (LLMs), which power AI systems like ChatGPT, rely on parameters to process and understand complex patterns in data. The more parameters a model has, the better it can comprehend and generate sophisticated responses. By making their model open source, DeepSeek allows developers worldwide to access, modify, and enhance the underlying code, enabling community-driven improvements and adaptations of the technology.

China’s Talent from Peking University 

AI researchers have become some of the most sought-after professionals in the financial industry, especially in hedge funds. The promise of using machine learning to forecast market movements, optimize trading strategies, and extract valuable insights from massive datasets has driven top-tier firms – often called “quant funds” – to invest in AI talent heavily.

DeepSeek hired graduates from Peking University, ranked 14th overall in the most recent QS University Rankings. For example: Xiaokang Chen, a researcher at DeepSeek AI, specializes in Computer Vision and Multi-Modal Learning.

What Is a Quant Fund?

A “quant fund,” short for “quantitative fund,” is a type of hedge fund that relies on mathematical models, statistical techniques, and algorithmic processes to guide its investments. Unlike more traditional hedge funds that depend on human-led analysis or discretionary decision-making, quant funds use data-driven strategies. Their teams often comprise mathematicians, data scientists, and computer scientists who develop and maintain models for automated trading and risk management. In this sense, AI plays a prominent role, as machine learning models can enhance prediction accuracy and allow real-time decision-making.

High-Flyer, the quant fund founded by Wenfeng, exemplifies this model-driven approach by integrating sophisticated algorithms into its trading strategies. Since 2015, High-Flyer has reportedly returned around 13% on average each year

Wenfeng’s DeepSeek Path

Yu Lili and Liu Jing interviewed Liang Wenfeng, to find out his novel path.

Here is an extensive quote from the interview:

“Undercurrent”: Before this, most Chinese companies would directly copy this generation of Llama structure for application. Why did you start from the model structure?

Liang Wenfeng: If the goal is to develop applications, then it is a reasonable choice to continue using the Llama structure and quickly launch products. But our destination is AGI, which means we need to study new model structures to achieve stronger model capabilities under limited resources. This is one of the basic research required to scale up to larger models. In addition to the model structure, we have also done a lot of other research, including how to construct data, how to make the model more human-like, etc., which are all reflected in the models we released. In addition, the structure of Llama is estimated to be two generations behind the advanced level abroad in terms of training efficiency and inference cost.

Project Stargate

American AI development remains robust with President Trump’s (click President Trump to watch the YouTube recording) important announcement at the White House on Project Stargate. According to the Stargate website:

“The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. We will begin deploying $100 billion immediately. This infrastructure will secure American leadership in AI, create hundreds of thousands of American jobs, and generate massive economic benefit for the entire world. This project will not only support the re-industrialization of the United States but also provide a strategic capability to protect the national security of America and its allies.”

Stargate’s large-scale initiative aims to bolster American leadership in AI, generate hundreds of thousands of domestic jobs, and yield substantial economic advantages worldwide. Beyond aiding the re-industrialization of the United States, the venture will also offer a strategic capability to safeguard the national security interests of America and its allies.

SoftBank, OpenAI, Oracle, and MGX provide the initial equity for Stargate. Among these, SoftBank and OpenAI serve as the primary partners, with SoftBank overseeing financial obligations and OpenAI managing operational responsibilities. Masayoshi Son will act as Stargate’s chairman. The principal technology collaborators for the project are Arm, Microsoft, NVIDIA, Oracle, and OpenAI. Construction is already underway in Texas, and additional sites are being evaluated nationwide as final agreements are put in place.

Project Stargate and Aristotle’s Nicomachean Ethics

In accordance with Aristotle’s Nicomachean Ethics, Project Stargate’s vast undertaking can be viewed as a pursuit of the collective good, promising both economic vitality and strengthened security for society. By mobilizing significant resources toward AI development, it embodies the Aristotelian principle of practical wisdom (phronesis), which demands reasoned action directed toward virtuous ends.

This article was written by Dr. Jasmin (Bey) Cowin, Associate Professor and U.S. Department of State English Language Specialist (2024). As a columnist for Stankevicius and a Horasis Moderator Brazil 2024, she writes on Nicomachean Ethics: Insights at the Intersection of AI and Education. Connect with her on LinkedIn.

Featured photo of DeepSeek’s CEO Liang Wenfeng