The necessary infrastructure for generative artificial intelligence (AI) requires considerable computer capacity, especially in the training and inference of advanced models. Even without the need to train from scratch, IA processing demands exceed those of traditional applications.
The generative AI servers equipped with GPUs have greater requirements in terms of size, weight, wiring, networks, food and cooling compared to standard servers. Therefore, its implementation in data centers requires meticulous planning.
Recommendations for generative data centers
Dell Technologies has identified four key areas to prepare data centers for generative AI:
- 1. Design of the data center capacity. It is essential to have a detailed plan on the size of the data center and the distribution of the space for the installation of servers and racks. The optimization of air flow and maintenance are critical aspects. In addition, it is essential to provide space for future extensions. The organization of racks must facilitate access and maintenance of servers and infrastructure. For this, it is recommended to establish a regular maintenance program, which includes periodic reviews and the replacement of air filters, fans and cooling units as necessary.
Four recommendations to prepare the data center for generative AI
- 2. Effective air flow management. The air flow is crucial to manage the heat generated by servers and infrastructure systems. IA infrastructure consumes much more energy than traditional servers, which generates more heat and makes air flow management and cooling even more important. Organizations must implement structured management strategies of the air flow, such as containment in cold and hot corridors. Directing the cold air directly to the inputs of the servers and away the hot air air of the equipment increases the efficiency of the cooling and reduces the energy costs.
- 3. Optimization of food and cooling. For high density GPU servers, it is essential to optimize food and cooling to ensure stable performance and avoid interruptions. Investment in efficient food sources, advanced transformers and backup systems, such as SAI and PDU, improves energy consumption management and reduces environmental impact. Since IA’s workloads generate a high heat level, traditional cooling may not be sufficient. Implement liquid cooling solutions allows more efficient heat dissipation, ensuring greater stability and longevity in high density configurations.
- 4. Efficient cable management. The implementation of AI requires efficient cable management, with aerial routing and separation of food cables and data to minimize interference and improve safety and reliability. Within the rack, it is essential to reduce disorder to avoid air blockages and facilitate maintenance, since a bad organization can generate heat accumulation and affect switching infrastructure.