During the recent TSMC North American Technology Symposium, Tesla made an announcement regarding the mass production and imminent deployment of its wafer-level Dojo processor, designed specifically for AI training.
Tesla's Dojo Training Tile, which is the official name for their system-on-wafer processor, features a 5x5 array comprising a total of 25 chips. These chips are placed on a carrier wafer and interconnected using TSMC's integrated fan-out (InFO) technology, known as Interconnect on wafer level (InFO_SoW).
According to IEEE Spectrum, the InFO_SoW technology enables high-performance connectivity, allowing Tesla's Dojo to function as a single processor despite the presence of 25 individual chips. In order to maintain consistency at the wafer level, TSMC incorporates virtual chip blank spots in between the actual chips.
Tesla's wafer-scale Dojo processor consists of 25 ultra-high-performance processors and consumes a significant amount of power, necessitating a sophisticated cooling system.
To meet the power requirements of the Dojo processor, Tesla employs a complex voltage regulation module that delivers 18,000 amps of power to the computing plane while dissipating up to 15,000W of heat. As a result, water cooling is utilized.
Tesla has not yet disclosed the specific capabilities of its Dojo wafer system. However, considering the challenges encountered during its development, it is poised to be a formidable solution for AI training.
Wafer-scale processors, such as Tesla's Dojo and Cerebras' Wafer Scale Engine (WSE), offer superior performance efficiency compared to multi-processor machines. These processors boast advantages such as high-bandwidth and low-latency communication between cores, reduced power delivery network impedance, and enhanced energy efficiency. Furthermore, they can benefit from redundant "extra" cores, or in the case of Tesla, known-good processor cores.