Unix & Ansible Site Reliability Engineer
Alexander Mann Solutions
2024-11-05 10:31:49
London City, Greater London, United Kingdom
Job type: contract
Job industry: I.T. & Communications
Job Duration: 12 months
Job description
We are AMS. We are a global total workforce solutions firm; we enable organisations to thrive in an age of constant change by building, re-shaping, and optimising workforces. Our Contingent Workforce Solutions (CWS) is one of our service offerings; we act as an extension of our clients' recruitment team and provide professional interim and temporary resources.
Our investment banking client has been present in the UK for more than 150 years, they're a long-term partner to British business. Today, the Group is formed of 10 divisions and employs 9,300 staff based in 21 core locations right across the country. Their role is simply stated: help clients achieve their goals by combining local know-how and global reach. In so doing, they seek to make a positive, sustainable contribution to both the UK economy and society.
On behalf of this organisation, AMS are looking for a Unix & Ansible Site Reliability Engineer for a 12 Months contract based in London (Hybrid)
Purpose of the Role:
This role is part of a critical initiative to transform our client's infrastructure and application resiliency within their Risk Systems function. They will be overhauling their environment to ensure robust, scalable, and highly available infrastructure. This is a unique opportunity to make a significant impact by implementing DevOps and Site Reliability Engineering (SRE) best practices, to minimise potential downtime and ensure continuity of service.
Responsibilities of the role:
As a Unix & Ansible Site Reliability Engineer you will be responsible for:
- Lead efforts to design and implement resilient IT applications using DevOps and SRE principles.
- Work on load balancers to decouple routing, ensuring no single point of failure in application traffic.
- Implement fully automated pipelines to replace manual tasks and enable one-click deployment and recovery processes in the event of system outages.
- Collaborate with SRE teams to implement robust monitoring, alerting, and logging solutions to ensure early detection of issues.
- Design, build, and maintain scalable, highly available infrastructure using tools like Ansible, Kubernetes, and Docker.
- Develop and implement automated disaster recovery processes to minimize system downtime.
- Identify opportunities for improvement in system performance, deployment speed, and scalability.
- Work primarily with Ansible and explore Kubernetes and Docker for high availability. Python or Java expertise is a plus.
What we require from the candidate:
- Strong experience in Ansible, Unix and Apache.
- Strong knowledge of decoupling applications and optimizing load balancer routing.
- Hands-on experience in setting up and maintaining pipelines for application deployment, ideally with one-click recovery systems.
- Knowledge of Kubernetes and Docker.
- Experience in monitoring tools, preferably Dynatrace.
- Familiarity with setting up monitoring and alerting systems for proactive issue identification and resolution (desirable).
- Proficiency in Jenkins (or any CI Tools).
Next steps
If you are interested in applying for this position and meet the criteria outlined above, please click the link to apply and we will contact you with an update in due course.
This client will only accept workers operating via an Umbrella or PAYE engagement model.
AMS, a Recruitment Process Outsourcing Company, may in the delivery of some of its services be deemed to operate as an Employment Agency or an Employment Business.