Dojo claims 1.8 exaFLOPS of performance
Tesla is building a prototype supercomputer that will be used to develop self-driving car capabilities based on optical cameras, rather than lidar or radar sensors.
Dubbed Dojo, the supercomputer was announced by Andrej Karpathy, the company’s head of AI, at the 2021 Conference on Computer Vision and Pattern Recognition, years after Tesla CEO Elon Musk first teased its existence.
Dojo features 10 petabytes of “hot tier” NVME storage and moves data at 1.6 terabytes per second, Karpathy told attendees.
It boasts 1.8 exaFLOPS of processing performance, which Karpathy said would make it the fifth most powerful supercomputer in the world; he later admitted to TechCrunch that the system hasn’t been benchmarked yet.
The specs detailed in the presentation show:
720 compute nodes with 8x Nvidia A100 80GB each (5,760 GPUs total)
1.8 exaFLOPS of [theoretical] performance (720 nodes x 312 TFLOPS-FP16-A100 x 8 GPU/nodes)
10 PB of “hot tier” NVME storage @ 1.6 TBps
640 Tbps of total switching capacity
Enter the Dojo
The Dojo is still in development, with Karpathy showing off a prototype version.
The supercomputer was first mentioned by Musk at Tesla’s Autonomy Day in 2019, with the billionaire suggesting last September that Tesla would eventually make its supercomputers available to other companies to train their neural nets.
Musk has been a vocal advocate for using a vision-only approach in autonomous vehicles, believing that cameras are faster than lidar or radar. Newly built North American Model Y and Model 3 vehicles that arrived last month featured no radar – instead, using cameras and machine learning as part of their autopilot and advanced driver assistance systems.
Tesla’s self-driving system is capable of collating video from eight cameras that surround the vehicle at 36 frames per second, providing information on the car’s surroundings.
It was designed to handle traffic warnings, pedal misapplications, and pedestrian detection, and has worked well in what Karpathy described as sparsely populated areas — welcome news to some, after growing distrust of autonomous vehicles following several well-publicized crashes involving Tesla cars. The company has pushed back against those fears, firmly arguing that drivers were to blame for any accidents.
“We have a neural net architecture network and we have a data set, a 1.5 petabyte data set that requires a huge amount of computing. I wanted to give a plug to this insane supercomputer that we are building and using now,” Karpathy said.
“For us, computer vision is the bread and butter of what we do and what enables Autopilot. And for that to work well, we need to master the data from the fleet, and train massive neural nets and experiment a lot. We invested a lot into the compute. In this case, we have a cluster that we built with 720 nodes of 8x A100 of the 80GB version. This is a massive supercomputer.”
Karpathy urged CVPR'21 attendees interested in working on its supercomputing team to get in touch, echoing a call to arms that Musk issued last August.