Avatar for HashiCorp
HashiCorp
Actively Hiring
Workflows to provision, secure, connect, and run any infrastructure for any application
  • B2B
  • Public Stage
    Publicly traded company
  • Top Investors
    This company has received a significant amount of investment from top investors
  • +2

Sr. Site Reliability Engineer II - Incident Excellence (Hybrid)

Posted: 7 months ago
Visa Sponsorship

Not Available

RelocationAllowed

About the job

About HashiCorp

HashiCorp solves development, operations, and security challenges in infrastructure so organizations can focus on business-critical tasks. We build products to give organizations a consistent way to manage their move to cloud-based IT infrastructures for running their applications. Our products enable companies large and small to mix and match AWS, Microsoft Azure, Google Cloud, and other clouds as well as on-premises environments, easing their ability to deliver new applications.

We use the Tao of HashiCorp as our guiding principles for product development and operate according to a strong set of company principles for how we interact with each other. We value top-notch collaboration and communication skills, both among internal teams and in how we interact with our users.

Our Team

The HashiCorp Incident Excellence team is responsible for improving HashiCorp’s incident response while maximizing learning from incidents. Our focus is on helping all engineers feel confident when they are on-call and improving communication to efficiently resolve incidents and build trust in our brand. We partner closely with teams to drive a holistic incident management strategy and share learnings to help our business continuously improve.

About this Role

This engineering role is on a nascent engineering team. The team is responsible for products that touch many areas of engineering organizations at HashiCorp, so applicants will need to excel at collaboration, have product-focused mindsets, and be comfortable iterating in an agile manner towards solutions.

You will provide expert execution of the incident command process, including running and managing high-severity incident bridges and driving transparent communication that promotes maximum levels of internal and external customer satisfaction.

Collaborate with an array of technical stakeholders and executives to drive resolution during incidents and improve overall response for future incidents and technical escalations

Utilize top-notch troubleshooting techniques to identify, organize, and advocate for novel solutions to remediate customer impact on complex interconnected systems.

Participate in a closed-loop post-incident learning process driving insights and meaningful action

Iterative improvements in response through consistent drills, tabletops, and game-day exercises

Push the boundaries of innovation in incident management to deliver best-in-class incident response.

In this role, you can expect to:

  • Be responsible for and drive incident management capabilities and culture.
  • Contribute to incident command on-call
  • Build technical skills and relationships within a team of engineers and SREs.
  • Lead and refine our incident response strategy, ensuring rapid and effective response to operational disruptions.
  • Analyze incident trends and root causes to drive continuous improvements in system reliability and response processes.
  • Develop and maintain tools for incident detection, analysis, and resolution, automating responses where possible to minimize human intervention.
  • Create comprehensive incident response documentation and conduct training sessions to prepare all relevant teams for effective incident handling.
  • Work closely with development, operations, and security teams to coordinate incident response efforts and post-incident analyses.

You may be a good fit for our team if:

  • 8+ years of experience in site reliability engineering, systems administration, or software engineering, with a significant focus on incident response and operational reliability.
  • 3+ years managing, coordinating, and ensuring resolution of major incidents.
  • Professional experience with incident management in cloud environments.
  • Enjoy working on a variety of scopes spanning software engineering, cloud infrastructure, and SRE.
  • Worked with SaaS or another type of managed software offering.
  • Proven track record of managing and resolving incidents in cloud-based environments, with expertise in major public cloud platforms (AWS, GCP, Azure).
  • Understanding of fundamental network technologies like DNS, Load Balancing, SSL, TCP/IP, HTTP
  • Strong understanding of monitoring and alerting systems, with the ability to develop metrics and alarms that accurately reflect system health and operational risks.
  • Experience with incident management tools and practices, including post-mortem analysis and root cause investigation.
  • Passion for consistently responding to and leading complex incidents in a 24x7x365 environment utilizing a globalized follow-the-sun model.
  • Customer-centric attitude with a focus on providing best-in-class incident response for customers and stakeholders
  • Familiarity with HashiCorp’s product suite and infrastructure automation tools is a plus.
  • Demonstrate strong leadership skills during periods of significant business impact, remaining calm and professional during high-pressure situations
  • A strong desire to drive customer success with partner teams and management on high-profile issues critical to the long-term success of the business
  • Outstanding verbal and written communication skills with the ability to convey information in a meaningful way to both engineers and executive-level management, during and outside of incidents
  • Adaptable to a wide variety of technologies and capable of incident response and troubleshooting activities in complex interconnected environments #LI-Hybrid

About the company

HashiCorp company logo

HashiCorp

Actively Hiring
Workflows to provision, secure, connect, and run any infrastructure for any application501-1000 Employees
  • B2B
  • Public Stage
    Publicly traded company
  • Top Investors
    This company has received a significant amount of investment from top investors
  • 4.1
    Highly rated
    HashiCorp is highly rated on Glassdoor, with 4.1 out of 5 stars
  • 4.1
    Work / Life Balance
    Employees rate HashiCorp 4.1/5 on Glassdoor for work / life balance
Learn more about HashiCorp image

Funding

AMOUNT RAISED
$359M
FUNDED OVER
6 rounds
Rounds
E
$175,000,000
Series E - Mar 2020+5

Perks

Medical, dental, and vision
HashiCorp offers your choice of medical plans as well as dental and vision coverage for you and any dependents, including spouses, domestic partners, and children. Coverage begins upon your first day of hire.
401(k)
Our 401(k) plan provides a variety of investment options to help you fund your retirement. The plan allows you to contribute a designated amount of your pre-taxed income from each paycheck thereby lowering your taxable annual income.
Remote friendly
We call San Francisco home, but our team is spreading across the world. Though some roles may be location dependent, we welcome remote work.
Flexible time off
We embrace a culture of personal responsibility and mutual trust, and we want our vacation and time off policy to reflect that. The FTO Policy allows employees to take paid time away from work for not only vacations and illnesses, but a variety of other personal needs. Employees may use FTO in any increments of time and there are no minimum allowances or maximum limits.
Commuter benefits
You may elect up to $255 per month for transit as well as parking expenses for a total of $510 for the purchase of commuter passes or payment of approved transit vendors. The monthly elections are pretax deductions which will lower your taxable income.
½ paid day off for company community service
Everyone is encouraged to take advantage of our company community service day which takes place on Veteran's Day each year and allows you take a half day of paid time off for the purpose of volunteering with a local charity of your choosing.
Life and disability insurance
HashiCorp provides life insurance coverage in the amount equal to your annual salary at no cost to you. You will also be covered under our short term and long term disability policies in the event that you are unable to work for an extended period of time due to a health condition.
Flexible Spending Account (FSA)
You can set aside pretax money to go towards the purchase or payment of approved health care and dependent care expenses. These can include copays, birth control, day care for children or elder adults, acupuncture, and more.
Generous paid holidays
We offer 8 Paid Holidays each year to all employees. We respect all major holidays, and provide an extended break for Thanksgiving, Christmas, and New Years.

Founders

Mitchell Hashimoto
Founder • 3 years
image
View the team image

Similar Jobs

Varidus company logo
Varidus
Varidus helps technology startups scale via fundraising support, partnerships and cost reduction
Get Set Resumes company logo
Get Set Resumes
Tailor Made Documents for Job Seekers
Hack For Change company logo
Hack For Change
Technology Interventions For Social Good
Kennect Technologies company logo
Kennect Technologies
Kennect is a SaaS Company, leading the way in Sales Performance Management