Deskripsi Pekerjaan
Join the Vanguard of Scalability
NexusCloud Systems is seeking a highly skilled Senior Site Reliability Engineer to join our mission-critical infrastructure team. You will be responsible for ensuring our global cloud architecture remains resilient, performant, and secure. This is an opportunity to design systems that handle massive traffic loads while defining the future of our automated deployment pipelines.
Tanggung Jawab
- Design and maintain highly available, scalable infrastructure on AWS and Kubernetes.
- Automate operational tasks using Go, Python, or Terraform to reduce manual toil.
- Lead incident response and perform blameless post-mortems to improve system reliability.
- Optimize cloud infrastructure costs without compromising performance.
- Collaborate with development teams to implement CI/CD best practices and shift-left security.
- Monitor system health through advanced observability and alerting frameworks.
- Participate in an on-call rotation to ensure 99.99% service uptime.
Kualifikasi
- Bachelor’s degree in Computer Science or equivalent practical experience.
- 5+ years of experience in SRE, DevOps, or Software Engineering roles.
- Deep expertise in Kubernetes, Docker, and container orchestration at scale.
- Proficiency in at least one infrastructure-as-code tool (Terraform, Pulumi, or CloudFormation).
- Strong programming skills in Go, Python, or Ruby.
- Solid understanding of distributed systems and microservices architecture.
- Excellent communication skills with the ability to articulate complex technical issues.