Job Summary
A company is looking for a Principal Architect for Site Reliability Engineering.
Key Responsibilities
- Design and architect scalable, resilient infrastructure for cloud-native and hybrid services
- Define and implement SRE principles, SLAs, SLOs, and error budgets across teams and services
- Collaborate with multi-functional teams to ensure reliability, observability, performance, and security
Required Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field (or equivalent experience)
- 15+ years of experience in infrastructure, cloud, or SRE roles, including at least 5+ years in an architectural or technical leadership position
- Expertise in cloud platforms (e.g., AWS, Azure, GCP) and container orchestration (Kubernetes)
- Deep understanding of distributed systems, microservices architecture, and CI/CD pipelines
- Proficient with observability tools (Prometheus, Grafana, ELK/EFK, Datadog) and infrastructure as code (Terraform, Ansible)