Deskripsi Pekerjaan
Elevate the Future of Cloud Infrastructure
NexusCloud Systems is seeking a visionary Senior Site Reliability Engineer to join our core platform team. In this high-impact role, you will be the architect of our global scale, ensuring 99.999% availability for our mission-critical services. You will work at the intersection of software engineering and systems operations, bridging the gap between code and infrastructure to deliver seamless digital experiences for millions of users.
Tanggung Jawab
- Design, build, and maintain scalable, high-performance infrastructure on public cloud platforms (AWS/GCP).
- Implement robust CI/CD pipelines to streamline deployment velocity and reliability.
- Lead incident response and perform deep-dive post-mortems to ensure continuous system improvements.
- Automate manual operational tasks through infrastructure-as-code (Terraform, Pulumi).
- Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to maintain platform health.
- Mentor junior engineers and promote a culture of operational excellence across the organization.
Kualifikasi
- 5+ years of experience in Site Reliability Engineering, DevOps, or Systems Architecture.
- Proficiency in programming languages such as Go, Python, or Rust.
- Expertise in container orchestration platforms, specifically Kubernetes at scale.
- Deep understanding of distributed systems, networking, and cloud-native security principles.
- Experience with observability tools such as Prometheus, Grafana, Datadog, or Honeycomb.
- Strong background in Linux internals and performance tuning.
- Bachelor's degree in Computer Science, Engineering, or equivalent professional experience.