Deskripsi Pekerjaan
Are you obsessed with uptime, scalability, and performance? NexusScale is seeking a elite Senior Site Reliability Engineer to join our core infrastructure team. In this role, you will be the architect of our reliability strategy, bridging the gap between development and operations to ensure our global platform remains seamless under extreme load.
You will work on cutting-edge distributed systems, tackle complex architectural challenges, and mentor junior engineers in a culture that values innovation over technical debt.
Tanggung Jawab
- Design, build, and maintain highly available and scalable distributed systems on GCP.
- Automate infrastructure provisioning and configuration management using Terraform and Ansible.
- Lead incident response protocols and perform blameless post-mortems to improve system resilience.
- Optimize system performance and cloud resource utilization to achieve cost-efficiency.
- Develop and maintain CI/CD pipelines to facilitate high-velocity code deployments.
- Implement advanced monitoring, logging, and alerting strategies using Prometheus and Grafana.
Kualifikasi
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- 5+ years of experience in SRE, DevOps, or Software Engineering roles.
- Strong proficiency in Go, Python, or Java with a focus on writing robust, maintainable code.
- Deep understanding of container orchestration platforms, specifically Kubernetes.
- Extensive experience with cloud providers (AWS, GCP, or Azure) and infrastructure-as-code (Terraform).
- Solid grasp of networking protocols, Linux internals, and database scaling strategies (PostgreSQL/Redis).
- Proven track record of managing large-scale, mission-critical production environments.