Job Summary
A company is looking for a Lead Site Reliability Engineer (SRE).
Key Responsibilities
- Define strategy, architecture, and roadmap for the site reliability engineering function
- Lead design, deployment, and optimization of production-grade containerized workloads and compliant cloud environments
- Establish observability, monitoring, and alerting frameworks to ensure performance and reliability
Required Qualifications
- BS in Computer Science, Cybersecurity, Software Engineering, or related field, or equivalent experience with 5+ years in relevant roles
- Proven expertise in container orchestration platforms, ideally Kubernetes
- Extensive experience with infrastructure-as-code, ideally Terraform
- Strong background in cloud platforms, ideally AWS
- Exceptional troubleshooting and incident management skills for distributed systems