Job Description :
Key Responsibilities :
- Server Installation & Configuration : Install, configure, and deploy servers in data center environments, ensuring they are correctly set up for optimal performance and scalability.
- Hardware Maintenance : Perform regular maintenance and health checks on servers, including monitoring hardware performance, updating firmware, and replacing or upgrading components.
- Troubleshooting & Repairs : Diagnose and resolve hardware and software issues related to the servers, ensuring minimal downtime and maintaining system integrity.
- Performance Optimization : Monitor server performance and implement corrective actions to optimize hardware's efficiency, stability, and reliability.
- System Updates & Patches : Apply firmware updates, patches, and drivers to NVIDIA servers, ensuring compatibility with the latest software and hardware environments.
- Integration Support : Help integrate NVIDIA GB200 servers with other systems and software, ensuring compatibility and smooth communication across the network.
- Documentation & Reporting : Maintain accurate records of server configurations, maintenance schedules, and troubleshooting efforts. Generate regular reports on server health, performance, and issues.
- Collaboration : Work closely with IT infrastructure teams, network engineers, and other technical staff to ensure seamless server operations and integration with existing infrastructure.
- Data Center Operations : Support data center operations, ensuring that NVIDIA servers are properly rack-mounted, cabled, and positioned for optimal airflow and cooling.
Required Skills and Qualifications :
Bachelors degree / High School Diploma.Proven experience working with servers or similar high-performance computing hardware.Strong understanding of server hardware, including CPU, memory, storage, networking components, and cooling systems.Solid understanding of networking concepts, protocols, and configurations (TCP / IP, DNS, DHCP, etc.).Proficiency with server diagnostics tools and hardware monitoring software.Preferred Qualifications :
Experience with NVIDIA-specific hardware and software solutions, including GPUs, CUDA, and other NVIDIA technologies.Familiarity with GPU server configurations and use cases, particularly in AI, machine learning, and high-performance computing environments.Knowledge of server management frameworks like IPMI, iLO, or similar.IT certifications (e.g., CompTIA A+, Cisco CCNA, or similar) are a plus.Familiarity with cloud platforms (AWS, Google Cloud, Azure) and their interaction with on-premises server infrastructure.Additional Information :
Ability to lift heavy hardware components and perform physical installations and repairs in a data center environment. Ability to lift up to 30 pounds regularly.
Ability to bend, stoop, crawl, kneel, crouch, reach, stand for long periods , and move about production and warehouse facilities.
The environment is temperature controlled, but otherwise, it is a typical production environment with loud noises.