Deskripsi Pekerjaan

Are you obsessed with uptime, performance, and automation? NexusScale is looking for a Senior Site Reliability Engineer to help us build and scale our high-traffic cloud infrastructure. You will be the bridge between development and operations, ensuring our systems are not just reliable, but resilient. If you thrive in a culture of blameless post-mortems and infrastructure-as-code, we want to hear from you.

Tanggung Jawab

Design, build, and maintain highly available, distributed cloud systems on AWS.
Automate infrastructure provisioning using Terraform and CI/CD best practices.
Monitor system performance, troubleshoot bottlenecks, and implement proactive optimization strategies.
Lead incident response and conduct thorough post-mortem analyses to prevent recurrence.
Collaborate with engineering teams to improve software delivery speed and reliability.
Manage capacity planning and resource allocation to ensure cost-efficiency.
Develop and maintain internal tooling to streamline deployment workflows.

Kualifikasi

5+ years of experience in SRE, DevOps, or large-scale Systems Engineering.
Deep expertise in AWS cloud services (EC2, EKS, RDS, S3).
Proficiency in infrastructure-as-code tools such as Terraform or CloudFormation.
Strong programming skills in Python, Go, or Ruby for automation scripting.
In-depth knowledge of Kubernetes and container orchestration at scale.
Experience with observability platforms like Datadog, Prometheus, or Grafana.
Excellent analytical skills and the ability to solve complex production issues under pressure.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer