About Zharfa Tech
Zharfa Tech is a technology company specializing in High-Performance Computing (HPC) solutions and Large Language Model (LLM) workloads. We design, build, and operate infrastructure that powers advanced AI and computational services.
Job Description
We are looking for an experienced DevOps Engineer to build, operate, and continuously improve our infrastructure, CI/CD pipelines, and automation systems. You will play a critical role in ensuring the scalability, reliability, and smooth deployment of our services.
Responsibilities
- Design, build, and maintain CI/CD pipelines
- Manage and automate infrastructure using Infrastructure as Code (IaC)
- Deploy, operate, and maintain containerized applications using Docker and Kubernetes
- Monitor system performance, availability, and reliability
- Implement and maintain logging, monitoring, and alerting systems
- Collaborate with development teams to improve deployment workflows
- Manage configuration management and automation tools
- Troubleshoot production issues and perform root cause analysis
- Document infrastructure architecture and operational procedures
Requirements
Must Have
- 5+ years of experience in DevOps or related roles
- Strong experience with Linux system administration
- Hands-on experience with containerization (Docker) and orchestration (Kubernetes)
- Experience with CI/CD tools such as GitLab CI, Jenkins, or GitHub Actions
- Knowledge of Infrastructure as Code tools (Terraform, Ansible, Pulumi)
- Proficiency with version control systems (Git)
- Scripting skills in Bash and/or Python
- Familiarity with monitoring and logging tools (Prometheus, Grafana, ELK Stack)
- Strong troubleshooting and problem-solving skills
- Strong documentation skills
Nice to Have
- Experience with HPC environments and job schedulers (Slurm, PBS)
- Knowledge of GPU-based workloads and infrastructure
- Experience with ML/AI deployment pipelines (MLOps)
- Familiarity with cloud platforms (AWS, GCP, Azure)
- Experience with service mesh technologies (Istio, Linkerd)
- Knowledge of GitOps practices (Argo CD, Flux)
- Experience designing and operating high-availability, distributed systems
- Relevant certifications (CKA, AWS, etc.)
What We Offer
- Daily breakfast and lunch
- Game room for relaxation and recreation
- Overtime compensation
- Insurance coverage
- Opportunity to work with cutting-edge HPC and AI infrastructure
- Collaborative, innovative, and technically driven team environment