Incident Commander
- Full Time
Not Available
About the job
THE ROLE:
We are seeking an experienced Incident Commander to join the Global Live Operations team. We are seeking a highly skilled and technology-focused leader to drive the evolution of the DAZN Critical Incidents Technical Response Group handling all incident types.
You will be responsible for Incident Management and will be the key decision-maker and authority to direct the Problem resolution path for fastest restoration to any service. You are responsible for managing the restoration of an impacted service affected by real or potential interruptions which may have an impact upon the quality or availability of that service. When a major or critical incident occurs, the right technical resources will be activated, you lead major Incident calls, determine the client impact, agree on resolution actions with everybody involved, manage the communication channel for focus on return-to-service. This will include managing technical sub-channels with tech development leads who will take point for sub-channels and isolate issues contributing to return-to-service.
Your responsible for business communication leaving the engineering teams focused on the return to service. This is a fast-paced high-tech environment and will require extended hours and after-hours follow up given the nature of the changes occurring 7x24, x365, the role is shift based. You will also have a hand to play in the technical delivery & transformation of complex projects & new features, transitioning them into PROD & operationalising new deliverables. Scoping monitoring requirements, creation of runbooks and highlighting delivery risks.We are live globally and are determined in our continued efforts to remain a leader in the streaming community. Are you ready to have your work impact millions daily and change the OTT landscape?
Benefits include access to DAZN, 25 days’ annual leave (increasing by 3 days after 3 years), private medical insurance, life assurance, pension contributions up to 5%, family friendly community including enhanced parental leave, electric vehicle benefit option, free access for you and one other to our workplace mental health platform app (Unmind), learning and development resources, opportunity for flexible working, and access to our internal speaker series and events.
As our new Incident Commander, you’ll have the opportunity to:
- Technically leads all aspects of critical incidents (S1-S3) - determine SMEs needed, identify problem and release/de-escalate after diagnosis meeting SLA’s
- Focused on fastest service restoration/recovery – bridge, teams communication channels, sync-points for sub-tech teams leading investigations (including 3rd party vendors and DAZN engineering teams).
- You are responsible for the quality and integrity of Major Incident Management process - interface with Service Delivery Managers, Support teams, and DAZN Development/Engineering teams.
- Provide recommendations on troubleshooting and other technology improvements to quickly resolve incidents, ensuring infrastructure and application stability
- Partners with other Support, Dev and Engineering teams to resolve difficult or unique system issues that team members are not equipped to handle.
- Provides Technical support to team members to facilitate resolution or escalation of technical issue
- Identify failure points driving availability and accelerating mean-time-to-repair including architectures, design, process improvements, software disciplines, test, etc… Interact frequently with various stakeholders across the organization to prioritize backlog for availability as required.
- Transition new features & projects into PROD, operationalising & highlighting risks.
- Build & scope monitoring capabilities
- Create transitional support runbooks & be responsible for pre & post delivery management.
You'll have:
- Experience of Managing of Major Incidents, providing leadership to resolution and clear communications throughout
- Working knowledge of ITIL incident, problem, and change management components
- The ability to co-ordinate technical, incident and supplier side teams to ensure that all incidents are accurately prioritised and effectively managed
- Experience to identify early indications of major incidents not progressing well and get things back on track
- Ability to determine precise customer impacts on our incidents through analysis of CS, social, error codes, and playback failures
- Knowledge of KPIs that indicate performance and customer experience (eg re-buffering, video playback failures, video-start failures, capacity monitoring)
- Operations experience in a 24x7, x365 support model
- Knowledge of ticketing system – Service Now or JIRA
- Working knowledge of cloud related architecture & systems. ECS, scalability, lambda & cloud related DB’s.
- Observability tooling experience, New Relic, Coralogix, Conviva etc.
- Delivery experience in fast-paced environments
- Operationalising complex process’ & workflows