Meta revealed its progress in developing custom infrastructure for AI workloads at a virtual event today, including generative AI that powers its new ad design and creation tools.
The announcement was a show of force from Meta, which has been behind its competitors like Google and Microsoft in adopting hardware systems optimized for AI.
“By building our own [hardware] capabilities, we have control over every layer of the stack, from datacenter design to training frameworks,” Alexis Bjorlin, VP of Infrastructure at Meta, told TechCrunch. “This level of vertical integration is needed to push the boundaries of AI research at scale.”
Meta has invested billions of dollars in hiring top data scientists and creating new kinds of AI, such as AI that runs the discovery engines, moderation filters and ad recommenders across its apps and services. But the company has faced challenges in turning many of its more advanced AI research innovations into products, especially in the generative AI domain.
Until 2022, Meta relied on a mix of CPUs — which are less efficient for AI tasks than GPUs — and a custom chip designed to speed up AI algorithms. Meta scrapped a large-scale deployment of the custom chip, planned for 2022, and instead bought billions of dollars’ worth of Nvidia GPUs that required major overhauls of several of its data centers.
To catch up, Meta decided to start working on a more ambitious in-house chip, expected in 2025, that can both train and run AI models. And that was the main focus of today’s presentation.
The new chip is called the Meta Training and Inference Accelerator, or MTIA for short, and it is part of a “family” of chips for accelerating AI training and inferencing workloads. The MTIA is an ASIC, a type of chip that combines different circuits on one board, allowing it to be programmed to perform one or many tasks in parallel.
“We needed a tailored solution that’s co-designed with the model, software stack and the system hardware to achieve better levels of efficiency and performance across our important workloads,” Bjorlin continued. “This provides a better experience for our users across a variety of services.”
Custom AI chips are becoming more common among the Big Tech players. Google created a processor, the TPU (short for “tensor processing unit”), to train large generative AI systems like PaLM-2 and Imagen. Amazon offers proprietary chips to AWS customers for both training (Trainium) and inferencing (Inferentia). And Microsoft, reportedly, is collaborating with AMD to develop an in-house AI chip called Athena.
Meta says it made the first version of the MTIA — MTIA v1 — in 2020, using a 7-nanometer process. It can scale beyond its internal 128 MB of memory to up to 128 GB, and in a benchmark test designed by Meta — which should be taken with caution — Meta claims that the MTIA handled “low-complexity” and “medium-complexity” AI models more efficiently than a GPU.
There are still challenges in the memory and networking areas of the chip, Meta says, which become bottlenecks as the size of AI models increases, requiring workloads to be distributed across several chips. (Meta recently acquired an Oslo-based team working on AI networking tech at British chip unicorn Graphcore.) And for now, the MTIA’s focus is only on inference — not training — for “recommendation workloads” across Meta’s app family.
But Meta emphasized that the MTIA, which it continues to improve, “greatly” boosts the company’s efficiency in terms of performance per watt when running recommendation workloads — allowing Meta to run “more enhanced” and “cutting-edge” (presumably) AI workloads.