Deskripsi Pekerjaan
Are you obsessed with system scalability, high availability, and automating the impossible? NexusCloud Systems is looking for a Senior SRE to help us build and maintain the next generation of cloud-native infrastructure. You will work alongside world-class engineers to ensure our platforms are performant, resilient, and ready to scale.
We value engineers who view operations as a software engineering problem. If you enjoy solving complex distributed systems challenges and want to make a significant impact, we want to hear from you.
Tanggung Jawab
- Design, build, and maintain highly scalable distributed systems on AWS/GCP.
- Automate infrastructure provisioning and configuration management using Terraform and Ansible.
- Lead incident response efforts and conduct blameless post-mortems to improve system reliability.
- Optimize production environments for performance, cost, and security.
- Develop internal tools to improve developer productivity and deployment velocity.
- Implement comprehensive observability solutions using Prometheus, Grafana, and ELK.
- Mentor junior team members on best practices for site reliability and infrastructure as code.
Kualifikasi
- 5+ years of experience in SRE, DevOps, or Systems Engineering roles.
- Deep expertise in Linux systems administration and container orchestration (Kubernetes).
- Strong proficiency in at least one programming language (Go, Python, or Ruby).
- Hands-on experience with cloud infrastructure (AWS preferred) and IaC tools.
- Solid understanding of CI/CD pipelines and deployment strategies (Canary, Blue/Green).
- Proven ability to troubleshoot complex performance bottlenecks in a microservices architecture.
- Excellent communication skills with the ability to explain technical concepts to non-technical stakeholders.