Deskripsi Pekerjaan

Are you obsessed with uptime, performance, and building resilient systems at scale? NexusScale is looking for a Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will bridge the gap between development and operations, implementing automated solutions to ensure our global platform remains lightning-fast and highly available. You will work on cutting-edge Kubernetes clusters, multi-cloud architectures, and observability platforms that drive our mission forward.

Tanggung Jawab

Design and maintain highly available, scalable infrastructure on AWS and GCP.
Automate operational tasks through infrastructure-as-code (Terraform, Pulumi).
Conduct blameless post-mortems and lead incident response for high-severity outages.
Develop and refine CI/CD pipelines to accelerate deployment velocity.
Implement advanced monitoring, logging, and tracing solutions (Prometheus, Grafana, ELK).
Proactively identify performance bottlenecks and capacity constraints.
Collaborate with engineering teams to improve system architecture and reliability patterns.

Kualifikasi

5+ years of experience in SRE, DevOps, or Systems Engineering roles.
Expertise in container orchestration with Kubernetes and Docker.
Proficiency in at least one scripting language (Python, Go, or Ruby).
Deep understanding of distributed systems and cloud-native architecture.
Experience with IaC tools like Terraform or CloudFormation.
Strong background in Linux internals, networking protocols, and security best practices.
Proven ability to manage and troubleshoot large-scale production environments.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer