Deskripsi Pekerjaan

Are you obsessed with uptime, scalability, and system performance? NexusScale is looking for a Senior Site Reliability Engineer to join our high-impact infrastructure team in San Francisco. You will be responsible for building, scaling, and maintaining the mission-critical systems that power our global cloud platform. We move fast, automate everything, and value engineering excellence above all else.

Tanggung Jawab

Architect and maintain highly available, distributed systems on AWS/GCP.
Develop and implement robust automation for CI/CD pipelines and infrastructure provisioning using Terraform.
Lead incident response and perform deep-dive post-mortems to ensure long-term system health.
Optimize service performance and cost through proactive capacity planning and resource management.
Collaborate with development teams to integrate observability, monitoring, and alerting frameworks.
Contribute to the evolution of our platform strategy and security posture.

Kualifikasi

5+ years of experience in SRE, DevOps, or large-scale Systems Engineering roles.
Advanced proficiency in Python, Go, or Ruby for automation and tool development.
Deep expertise in container orchestration using Kubernetes and Docker.
Strong background in managing large-scale cloud infrastructure (AWS/GCP/Azure).
Proven experience with Monitoring tools such as Prometheus, Grafana, or Datadog.
Strong understanding of network protocols, security best practices, and database reliability.
BS/MS in Computer Science, Engineering, or equivalent practical experience.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer