Deskripsi Pekerjaan

Are you obsessed with system reliability, performance, and automation? CloudScale Dynamics is seeking a Senior Site Reliability Engineer to join our core infrastructure team. You will be instrumental in scaling our global platforms, ensuring 99.999% availability, and building the next generation of our observability stack.
You will work at the intersection of software engineering and systems operations, leveraging your expertise to eliminate manual toil through code. Join a team of world-class engineers where performance meets innovation.

Tanggung Jawab

Design, build, and maintain highly scalable, distributed production systems on AWS.
Automate infrastructure provisioning using Terraform and CI/CD pipelines.
Conduct incident response and blameless post-mortems to improve service stability.
Optimize cloud costs and system performance through proactive capacity planning.
Develop self-healing mechanisms and automated monitoring/alerting strategies.
Collaborate with development teams to ensure software is deployable and observable.
Mentor junior engineers on best practices for infrastructure-as-code and reliability.

Kualifikasi

5+ years of experience in SRE, DevOps, or Systems Engineering roles.
Deep proficiency with AWS (EC2, EKS, RDS, S3) and modern cloud architectures.
Strong programming skills in Go, Python, or Ruby.
Expert-level knowledge of Kubernetes and container orchestration at scale.
Deep understanding of observability tools like Prometheus, Grafana, and Datadog.
Strong grasp of Linux system internals, networking, and security best practices.
Proven ability to troubleshoot complex issues in distributed environments.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer