Deskripsi Pekerjaan
Are you obsessed with uptime, scalability, and system performance? NexusCloud Systems is looking for a elite Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will bridge the gap between development and operations, building robust, automated systems that power our global SaaS platform. You will work at the cutting edge of cloud-native technologies to ensure our services remain performant and resilient for millions of daily users.
Tanggung Jawab
- Design, build, and maintain highly available, scalable, and secure cloud infrastructure on AWS/GCP.
- Automate manual operational tasks through infrastructure-as-code (IaC) using Terraform and Ansible.
- Lead incident response efforts and conduct blameless post-mortems to improve system reliability.
- Optimize cloud spending through resource utilization analysis and auto-scaling fine-tuning.
- Develop and maintain comprehensive monitoring, alerting, and observability dashboards.
- Collaborate with engineering teams to improve software delivery pipelines and CI/CD workflows.
- Implement security best practices and compliance standards across the production environment.
Kualifikasi
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
- 5+ years of experience in SRE, DevOps, or Systems Engineering roles.
- Deep expertise in Kubernetes, Docker, and container orchestration at scale.
- Proficiency in programming languages such as Go, Python, or Ruby for automation.
- Extensive experience with cloud infrastructure providers (AWS or GCP).
- Strong understanding of CI/CD methodologies and tools like Jenkins, GitLab CI, or GitHub Actions.
- Proven ability to troubleshoot complex, distributed systems in a production environment.