Senior Site Reliability Engineer (Python/Golang)
- Full Time
Not Available
About the job
DAZN are a leading worldwide sports broadcaster, and we care a lot about reliability. The SRE team based in the UK and India works every day to achieve this, helping our developers reduce pageable events and speed up recovery times when they happen. This leads to happier customers, and happier engineers!
We are growing, and in order to grow well we need Senior SREs to join the team and help lead the way, does this sound like something you could help with?
As a Senior SRE you’ll be focussed on improving the reliability of critical services, which means figuring out how to get the teams to their reliability goals, growing the team by teaching what you know, sharing what you learn and lifting others up to your level and growing yourself by learning every day and setting strong OKRs for yourself.
What will you be doing?
- You will be enabling DAZN teams to Build, Run and Own reliable services by providing end-to-end observability, sharing good practices and asking smart questions. SRE provides Machine Learning to pre-scale AWS workloads to get ahead of the crowds, testing for every element of performance using K6 Cloud, managing error budgets and SLOs using Victoria Metrics, Grafana and statsd; gathering the logs using a centralised solution based on Kinesis.
- All these observability pieces are assembled in New Relic and can page folks when needed via Pagerduty.
- You’ll be writing code constantly in Golang and React deploying it via Github Actions and enriching service data from Manifests included in Github repos, your APIs will be available everywhere via front-ends written in node.js and secured by AAD and HMAC.
- As a Senior SRE you’ll be focussed on improving the reliability of critical services, which means figuring out how to get the teams to their reliability goals, growing the team by teaching what you know, sharing what you learn and lifting others up to your level and growing yourself by learning every day and setting strong OKRs for yourself.
- Every Monday, you might teach the teams something that would help them. Every day you’ll learn from those teams, via BAU or in post-incident reviews, so we can pass this on. Every Friday afternoon you can get together with the wider engineering community for gaming sessions and alternate Friday’s we have a TGIF day, where you can do anything you think is useful for the team, and the company
- Being adaptable and ready for on-call support is essential in our role. It's like being the superhero of our team, ready to swoop in and save the day whenever the need arises.