Deskripsi Pekerjaan

Are you obsessed with system uptime, performance at scale, and automating the mundane? NexusCloud Systems is looking for a Senior Site Reliability Engineer to join our core infrastructure team in San Francisco. You will play a pivotal role in designing, building, and maintaining our high-availability cloud platforms, ensuring that millions of users enjoy a seamless experience every day.
We prioritize engineering solutions over manual intervention and seek an SRE who thrives on tackling complex architectural challenges in a fast-paced environment.

Tanggung Jawab

Design and manage highly scalable, distributed systems hosted on AWS/GCP.
Drive capacity planning, performance tuning, and infrastructure optimization.
Automate infrastructure provisioning using Infrastructure as Code (Terraform, Ansible).
Implement advanced monitoring, logging, and alerting strategies to improve observability.
Lead incident response protocols and conduct blameless post-mortems.
Collaborate with development teams to integrate CI/CD best practices.
Mentor junior engineers on reliability engineering standards and best practices.

Kualifikasi

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
5+ years of experience in SRE, DevOps, or Software Engineering roles.
Deep expertise in Linux systems administration and container orchestration (Kubernetes).
Proficiency in scripting or programming languages (Python, Go, or Ruby).
Hands-on experience with cloud infrastructure (AWS or GCP) and networking protocols.
Strong problem-solving skills and the ability to debug complex issues across the stack.
Excellent communication skills with a collaborative, growth-oriented mindset.

Senior Site Reliability Engineer (SRE)

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Software Engineer

Senior Data Scientist

Senior AI/Machine Learning Engineer

AI Engineer

Senior AI/ML Engineer