Deskripsi Pekerjaan

Are you obsessed with system reliability and massive scale? NexusCloud Systems is looking for a Senior Site Reliability Engineer to join our high-impact infrastructure team. You will be instrumental in building the backbone of our global SaaS platform, ensuring 99.999% uptime, and automating complex cloud-native environments.
We value engineering excellence, pragmatic decision-making, and a deep commitment to observability and performance optimization.

Tanggung Jawab

Design and maintain highly scalable, fault-tolerant infrastructure on AWS/GCP.
Develop automation tooling to reduce manual operational toil using Python or Go.
Lead incident response efforts and conduct blameless post-mortems to improve system resilience.
Implement and refine SLOs, SLIs, and comprehensive monitoring dashboards.
Collaborate with DevOps teams to integrate CI/CD pipelines and infrastructure-as-code (Terraform).
Manage Kubernetes clusters at scale, ensuring resource optimization and security compliance.

Kualifikasi

5+ years of experience in SRE, Systems Engineering, or DevOps roles.
Advanced proficiency with Kubernetes, Docker, and container orchestration.
Strong expertise in infrastructure automation (Terraform, Ansible, or Pulumi).
Proven experience with cloud providers (AWS preferred) and networking concepts.
Experience with observability stacks like Prometheus, Grafana, or Datadog.
Deep understanding of distributed systems and microservices architectures.
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer