Job Requirements
- Install/maintenance various equipment in data centers, including CPU/GPU/NPU servers, high-speed interconnection networks such as InfiniBand and RoCE, storage servers, and firewalls.
- Initialize system H/W components including firmware such as GPU/NPU device drivers, communication libraries for clustering.
- Analyze and resolve the causes of various H/W errors.
- Provide overall management and technical consulting for company's own/customer operating infrastructure.
Qualifications
- 1+ years of experience operating and managing Linux-based cluster systems
- Extensive understanding of various H/W components of computer systems.
- Experience in analyzing various logs and operating monitoring solutions for large-scale IT infrastructure.
- Experience in installing and maintaining Linux systems at an IT system/solution distributor or reseller.
- Fluent English conversation skills
- Excellent logical thinking and problem-solving skills.
Preferential
- Bachelor's degree in Computer Engineering or related field.
- Experience installing, configuring, and operating InfiniBand networks.
- Experience in building and managing cluster systems, especially GPU clusters.
- Having experience operating/monitoring large-scale cluster (up to hundred nodes).
- Fluent in English