Groq
Groq is an AI inference company headquartered in Mountain View, California, founded in 2016 by Jonathan Ross — the inventor of the Google TPU — to deliver the world fastest …
What Groq Does
Groq is an AI inference company headquartered in Mountain View, California, founded in 2016 by Jonathan Ross — the inventor of the Google TPU — to deliver the world fastest inference for large language models. Groq raised $1.75 billion in total funding at a $6.9 billion valuation before NVIDIA agreed in December 2025 to pay $20 billion to license Groq Language Processing Unit (LPU) technology in the largest technology licensing deal in semiconductor history.
Groq continues to operate independently under CEO Simon Edwards, expanding GroqCloud as a public inference API platform. The LPU is a deterministic streaming dataflow architecture purpose-built for transformer model inference: unlike GPUs that handle diverse workloads with shared memory and unpredictable scheduling, the LPU executes attention and feed-forward layers as fixed-function hardware pipelines with zero memory bandwidth bottlenecks, delivering 5-10 times faster token generation than GPU-based alternatives.
GroqCloud benchmarks demonstrate 1,345 tokens per second on Llama-3 8B and 662 tokens per second on Qwen-3 32B, with sub-100ms first-token latency at scale. Pricing starts at $0.05 per million input tokens, making Groq among the most cost-efficient inference APIs alongside its speed advantage.
The platform supports leading open-source models including Meta Llama, Mistral, Gemma, and Qwen. Groq serves AI developers building real-time applications where inference speed is the critical bottleneck: voice AI systems requiring natural conversation cadence, code generation tools needing sub-second completions, agentic AI systems executing multi-step reasoning quickly, and customer-facing applications where latency directly affects user experience.
NVIDIA integration of the LPU architecture into its Groq 3 LPX inference accelerator validates the fundamental superiority of streaming dataflow for inference workloads and positions GroqCloud as the reference benchmark for AI inference performance.
Sign in with your company email to claim and enrich this profile.