Deskripsi Pekerjaan

Are you obsessed with system uptime, performance at scale, and automating the impossible? NexusCloud Systems is seeking a world-class Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will bridge the gap between development and operations, building robust, self-healing systems that power our global platform.
You will work with a modern tech stack, influence architectural decisions, and champion a culture of reliability. We are looking for an engineer who treats infrastructure as code and believes that manual intervention is a bug to be squashed.

Tanggung Jawab

Architect and maintain highly available, scalable, and secure cloud infrastructure on AWS.
Automate operational workflows using Terraform, Ansible, and Python/Go.
Lead incident response efforts and conduct blameless post-mortems to improve system resilience.
Optimize cloud resource utilization to balance performance with cost-efficiency.
Develop and maintain monitoring, alerting, and observability frameworks (Prometheus/Grafana/Datadog).
Collaborate with engineering squads to integrate CI/CD best practices into the development lifecycle.

Kualifikasi

5+ years of experience in SRE, DevOps, or Systems Engineering roles.
Expert-level proficiency in AWS cloud services and Kubernetes orchestration.
Strong coding skills in Python, Go, or Ruby for automation and tool development.
Deep understanding of Linux internals, networking, and distributed systems.
Proven experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Ability to participate in an on-call rotation and handle complex system troubleshooting.
Strong communication skills and a passion for mentoring junior team members.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer