Groq Hires AI Pioneer LeCun, Valuation Reaches $2.8B

Advertisements

On August 6, Groq, known as one of Nvidia's strong contenders, made headline news with two significant announcements:

Firstly, it secured $640 million in Series D financing, led by BlackRock, with Cisco and Samsung's venture capital unit among other investors, pushing its valuation to $2.8 billion (approximately 20 billion RMB).

Secondly, Groq appointed Yann LeCun, a Turing Award winner in 2018, a professor at New York University, and Facebook's VP and Chief AI Scientist, as a technical advisor.

Despite his public spats with Elon Musk, LeCun is recognized as a foundational figure in AI, his prominence tied to groundbreaking innovations in deep learning, and he is widely regarded as a leading spokesperson for cutting-edge global models.

Groq, which has frequently sought to engage AI industry titans, has now truly secured the backing of a heavyweight giant.

Founded by a team that included core members of Google’s TPU, Groq has carved a niche in the AI chip landscape, initially shrouding its technology and products in mystery to tease the industry.

However, facing a market that didn’t meet expectations, Groq pivoted after catching the generative AI tide, branding itself as offering the “world’s fastest inference,” and directly challenging personalities like Musk, OpenAI CEO Sam Altman, and Meta’s Mark Zuckerberg with fervent aggressiveness.

Besides their flair for drawing attention, the company's ability to attract admirers is commendable, garnering endorsements from notable industry figures.

LeCun mentioned that “Groq chips have significant market viability.” Zuckerberg announced that Groq would be providing inference chips for Meta’s Llama 3.1 large language model

Moreover, former Alibaba VP and founder of AI Infra company Lepton AI, Jia Yangqing, labeled himself a “superfan” of Groq.

With this new financing round, Silicon Valley’s three major AI chip unicorns — Groq (valued at $2.8 billion), Cerebras (valued at $4 billion), and SambaNova (valued at $5.1 billion) — are finally congregating in the 20 billion RMB valuation club.

Nvidia, the world’s largest AI computing giant, has seen its market valuation soar to $3 trillion, with revenues hitting approximately $60.9 billion in 2023.

In contrast, Groq's footprint is considerably smaller

According to financial documents reviewed by Forbes, the startup's expected sales for 2023 stand at just $3.4 million, with a net loss estimated at $88.3 million.

However, sources indicate that Groq is optimistically forecasting sales of up to $10 million for this year.

Reaching this point has signaled a transformative turnaround for this AI chip startup, signaling a shift in its fortune.

Despite presently positioning itself as one of the most aggressive challengers to Nvidia, Groq's path was fraught with challenges before the generative AI boom ignited enthusiasm for AI technologies globally.

Co-founder Jonathan Ross recalls near disastrous moments for Groq, particularly during the low period of 2019 when they were just a month away from running out of funds.

So much so that Ross later lamented that Groq might have launched too early.

In late 2016, eight of the ten core members of Google's TPU quietly left, joining forces to establish Groq.

Google’s TPU was instrumental in the defeat of the world champion Go player by AlphaGo, propelling the global market for dedicated AI chips into the spotlight

The departure of its core designers attracted significant attention, yet Groq initially remained publicly silent as they navigated financial difficulties until late 2019, when they began sporadically releasing blog posts to satisfy industry curiosity.

Reports in 2017 indicated that Groq had secured $10.3 million in seed funding, marking their first appearance in the public eyeSubsequently, finding new investors became challenging as Groq faced three additional funding rounds that cumulatively totaled just over $60 million.

It wasn’t until April 2021 that Groq secured a relatively large funding round of $300 million, raising its total funding to over $360 million and achieving a valuation of over $1 billion, entering the ranks of chip unicorns.

Fast forward three years, and Groq has successfully raised $640 million in new funding, raising its total funding beyond the billion-dollar mark and inflating its valuation to $2.8 billion, more than double what it was post the last funding round.

Ross shared on social media a hint of celebratory tones: initially, they aimed to raise $300 million to deploy 108,000 IPUs by the end of Q1 2025, but they unexpectedly raised twice the required amount and are thus expanding their cloud computing and core engineering teams.

During the launch of its flagship Llama 3.1 405B model, Zuckerberg noted that “innovators like Groq have built low-latency, low-cost inference services for all new models.”

Ross remarked that when compared to Nvidia's GPUs, the LPU cluster provides higher throughput, lower latency, and more cost efficiency for extensive language inference tasks.

Groq's self-developed LPU (Language Processing Unit) aims to address computing density and memory bandwidth bottlenecks encountered with large language models, boasting computational capabilities that exceed both GPUs and CPUs, significantly reducing the computation time needed for each word and accelerating text sequence generation.

In the wake of the generative AI boom ignited by ChatGPT, Groq has ramped up its promotional efforts for its LPU AI inference engine, claiming it achieves “the world’s fastest inference,” while frequently sharing positive test results and reviews from partners and users on social media.

In February of this year, based on technology demonstration videos shared by Groq and various users, the LPU running the Mixtral 8x7B-32k large language model took merely 11 seconds to generate responses, as compared to OpenAI's ChatGPT 4, which required as much as a minute.

Matt Shumer, the CEO of AI writing startup HyperWriteAI, lauded the LPU as “lightning fast,” noting it could produce hundreds of words in less than a second, asserting that the runtime for large language models is merely a fraction of a second.

According to data published by Artificial Analysis in July, Groq outputs Llama 3 70B at an approximate speed of 340 tokens/s, which is twice as fast as GPT-4o mini.

Global Capital participated in Groq's multiple rounds of funding

alefox

Co-founder Aemish Shah stated that Groq's inference speed “clearly outperforms any other product on the market.”

With its instantaneous AI inference appeal, a surge of developers have flocked to Groq.

In March, Groq launched GroqCloud, a developer platform powered by the LPUDevelopers can rent LPU chips through this platform, eliminating the need for direct purchases.

The platform hosts open-source models including Meta Llama 3.1, OpenAI Whisper Large V3, Google Gemma, and Mistral Mixtral, enabling API access to its chips within cloud instances.

To attract developers, Groq offered free access; within the first month, 70,000 users registered

Currently, over 360,000 developers are creating AI applications on GroqCloud, and that number continues to grow.

Recently, Groq appointed former Intel outsourcing chief and former HP CIO Stuart Pann as COOPann expresses optimism for Groq's progress: over 25% of customers ordering from GroqCloud have requested increased computing capabilities.

With the new financing, Groq plans to broaden its TaaS (Token as a Service) offerings and add new models and features to GroqCloud.

Groq claims that its LPU achieves an energy efficiency that is at least ten times that of GPUs when running large language models and other generative AI solutions.

The GroqChip1 uses a 14nm process, featuring 230MB of on-chip shared SRAM, with a memory bandwidth of 80TB/s, an FP16 computing power of 188TFLOPS, and int8 computing power of 750TOPS.

In contrast to many large model chips, Groq's chip does not incorporate HBM or CoWoS, rendering it independent from HBM supply shortages

It employs a single-core temporal instruction set computer architecture that does not require frequent data loading from memory like HBM-equipped GPUs, thus optimizing cost efficiency and enhancing the speed at which large language models can operate.

▲Memory architecture of Groq chip

Real-time AI inference is a distinct systems challenge, where both hardware and software play roles in speed and latencyEven the most sophisticated software cannot overcome hardware bottlenecks rooted in chip design and architecture.

Groq's software-defined hardware approach shifts decision-making steps for execution control and data flow control from hardware components to the compiler

By precisely scheduling each memory load, operation, and data packet transfer, it ensures peak performance and fastest system response, freeing up additional chip space and processing power.

▲Groq's simplified software-defined hardware method unlocks additional chip space and processing power

The compiler segments models into smaller blocks, which are spatially mapped to multiple LPU chipsLike an assembly line, each LPU cluster is designated to perform specific computing phases, storing all requisite data in its local SRAM memoryData transfers occur from LPU to LPU, eliminating the need for external HBM chips and routers.

This efficient pipeline architecture is viable because the LPU inference engine operates with complete determinism

The system precisely knows what occurs at each stage on each chip, allowing for optimal pipeline operation.

In contrast, GPUs function in small chip clusters where each cluster executes every sequential computation stage required to generate tokensDuring each phase, the GPU retrieves all the data needed for that stage from the HBM on another chip, returning it to external HBM upon completion, which necessitates data transfers directed by outside chips — both inefficient and costly.

The Groq compiler directly maps operations onto the LPU without requiring manual tuning or experiments, resulting in straightforward LPU design

Built upon tensor streaming architecture, the LPU is free from the constraints of CUDA or kernels.

▲Single LPU architecture

“Our goal is to ensure that every dollar invested in hardware yields a full return; we don't plan to operate at a loss,” Ross stated.

Since launching its chip sales two years ago, Groq has gradually secured customers, collaborating with firms like Meta and Samsung, as well as sovereign nations like Saudi Arabia for production and deployment.

The Argonne National Laboratory has utilized Groq chips for nuclear fusion research

Earlier this year, Groq entered into a partnership with Saudi Aramco Digital Company to establish one of the largest AI inference-as-a-service computing infrastructures in the Middle East and North Africa, collaborating with European sustainable energy company Earth Wind & Power to deploy thousands of IPUs in Norwegian data centers.

Currently, Groq is advancing research and production for its next-generation chips, having announced in August of last year that it will partner with GlobalFoundries to produce 4nm IPUs.

Previous reports indicate that the energy efficiency of Groq's next-gen chips is anticipated to improve by 15-20 times compared to the previous generation, with increased size

The number of chips needed to perform the same task will also drastically decrease.

When conducting inference benchmarks on the Meta Llama 2 70B model, Groq is set to interconnect 576 chips across nine racksBy 2025, this could potentially be achieved with just about 100 chips across two racks.

As the enthusiasm for generative AI persists, the outlook for the AI chip market appears promising, although Groq faces intensifying competition.

According to a blog post from Groq published in April this year, the total addressable market (TAM) for AI chips is expected to reach $119.4 billion by 2027, with approximately 40% of AI chips currently utilized for inference

Once applications mature, they typically allocate 90-95% of resources for inference, indicating a burgeoning inference market over time.

Currently, Nvidia dominates 70-95% of the AI chip marketTech giants like Google, Microsoft, Amazon, and Meta are all developing their own AI chipsOpenAI is preparing to initiate an AI chip manufacturing plan this year, and it has been reported that Arm will establish an AI chip department.

Several AI chip companies are also making new movesLast year, US AI chip startup D-Matrix secured $110 million in Series B funding; in June, US AI chip startup Etched announced completing $120 million in Series A funding; US wafer-level chip unicorn Cerebras has secretly filed for an IPO; in July, Japan's SoftBank Group acquired UK AI chip unicorn Graphcore for $600 million.

According to reports, a venture capitalist declined to participate in Groq's new financing, arguing that while Groq's approach is innovative, its intellectual property may not be reliable in the long run