My client is seeking an experienced Contract Site Reliability Engineer Team Lead to join their team. This is a hands-on leadership role requiring a strong technical background alongside the ability to lead, mentor, and develop a high-performing SRE function. There will be occasional travel to the client's offices in Central London.
This role is paying £475 Per day and is Outside IR35.
Responsibilities
- Provide technical leadership for the Site Reliability Engineering function, driving reliability, scalability, and performance improvements across critical production systems.
- Lead, mentor, and coach a team of SREs and engineers, fostering a culture of operational excellence, collaboration, and continuous improvement.
- Remain hands-on with the design, implementation, and support of cloud infrastructure, automation, observability, and platform reliability initiatives.
- Define, implement, and govern SLOs, SLIs, and error budgets, ensuring alignment between engineering priorities and business objectives.
- Architect, maintain, and optimise highly available, distributed systems within an AWS cloud environment.
- Drive change management initiatives across infrastructure, platforms, and operational processes, ensuring smooth adoption of new technologies and ways of working.
- Champion Infrastructure as Code (IaC) and automation practices, reducing manual operational effort through tools such as Terraform and CloudFormation.
- Collaborate closely with development, platform, and operational teams to embed reliability and resilience best practices throughout the software development lifecycle.
- Lead incident management, root cause analysis, and continuous service improvement activities.
- Establish and enhance monitoring, alerting, and observability capabilities across the technology estate.
Required Skills & Experience
- Proven experience in a Site Reliability Engineering, DevOps, Cloud Engineering, or Infrastructure Engineering role, with experience leading or mentoring technical teams.
- Demonstrable hands-on technical expertise alongside leadership responsibilities.
- Strong experience delivering and managing change within complex technology environments.
- Extensive experience working with AWS cloud services and architectures, as the client's platform is hosted within AWS.
- Strong Linux/Unix systems administration knowledge.
- Proficiency in one or more scripting or programming languages such as Python, Bash, Go, or Java.
- Strong experience with Infrastructure as Code tools, including Terraform and/or CloudFormation.
- Experience with containerisation and orchestration technologies, including Docker and Kubernetes.
- Familiarity with CI/CD tooling such as Jenkins, GitHub Actions, GitLab CI, or Azure DevOps.
- Essential experience with observability and monitoring platforms, including Datadog and Splunk.
- Strong understanding of distributed systems, networking, security principles, and cloud-native architectures.
- Excellent troubleshooting, problem-solving, and stakeholder management skills.
Desirable
- Experience operating within large-scale, mission-critical production environments.
- Previous experience establishing or maturing SRE practices and operating models.
- Relevant AWS, Kubernetes, or cloud certifications.
Please apply for immediate consideration.