Deskripsi Pekerjaan

Are you obsessed with system reliability, performance, and automation? NexusCloud Solutions is seeking a Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will bridge the gap between development and operations, ensuring our high-traffic global platforms remain scalable, resilient, and performant. You will be instrumental in driving our move toward complete infrastructure-as-code and cloud-native observability.

Tanggung Jawab

Design, build, and maintain highly available, scalable, and secure cloud infrastructure.
Automate infrastructure provisioning and configuration management using Terraform and Ansible.
Drive incident response and post-mortem analysis to maintain 99.99% system uptime.
Implement and manage advanced monitoring, logging, and alerting systems (Prometheus, Grafana, ELK).
Collaborate with cross-functional engineering teams to optimize application performance and CI/CD pipelines.
Conduct regular capacity planning and load testing to ensure seamless peak-traffic handling.
Mentor junior engineers and advocate for SRE best practices across the organization.

Kualifikasi

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
5+ years of experience in SRE, DevOps, or Systems Engineering roles.
Deep expertise in AWS/GCP cloud platforms and container orchestration (Kubernetes).
Advanced proficiency in scripting languages such as Python, Go, or Bash.
Proven experience with infrastructure-as-code tools like Terraform or Pulumi.
Strong understanding of distributed systems, networking protocols (TCP/IP, DNS, HTTP), and load balancing.
Excellent analytical, problem-solving, and communication skills in a fast-paced environment.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer