Job Description
- Build/manage system S/W components such as GPU/NPU device drivers, communication libraries, directory services, distributed file systems, AI acceleration, and object storage for clustering.
- Automate S/W provisioning processes through IaC tools such as Ansible and Terraform or programming.
- Build/manage container orchestration tools such as Kubernetes (K8s) in clusters.
- Analyze and resolve the causes of various S/W or H/W errors.
- Provide overall management and technical consulting for Moreh's customer operating infrastructure.
- Install/operate various equipment in data centers, including CPU/GPU/NPU servers, high-speed interconnection networks such as InfiniBand and RoCE, storage servers, and firewalls.
Your Skills and Experience
- Bachelor's/Graduate degree in Computer Engineering or related field.
- 3+ years of experience operating and managing Linux-based cluster systems
- Extensive understanding of various H/W and S/W components of computer systems.
- Knowledge of Docker and Kubernetes, and experience building a Kubernetes cluster oneself.
- Experience in analyzing various logs and operating monitoring solutions for large-scale IT infrastructure.
- Fluent English conversation skills (Writing & Reading).
- Excellent logical thinking and problem-solving skills.
Preferential:
- Python/C Programming Skills is a plus.
- Experience in building and managing cluster systems, especially GPU clusters is a plus.
Why you'll love working here
- Training opportunity with Korean experts
- Local social benefits in Vietnam, including but not limited to local insurance, public holidays, and 12 days of paid annual leave
- Additional company benefits, including but not limited to:
- 13th month salary
- Annual Health Check-Up
- Equipment Upgrades
- Sports Club Sponsorship
- Monthly Happy Dinners
- And More