- Top 10% of respondersBlueVine is in the top 10% of companies in terms of response time to applications
- Responds within two weeksBased on past data, BlueVine usually responds to incoming applications within two weeks
- B2B
- +5
SRE II
- Full Time
Not Available
Angelique G Cuevas
About the job
We are seeking a highly skilled and proactive SRE to join our team. As the first line of defense and Tier 1 support, you will play a critical role in ensuring the uninterrupted operation of our services across various environments, with a primary focus on AWS cloud infrastructure
WHAT YOU'LL DO:
Monitoring and Alert Management: Constant monitoring of production, staging, and other environments for any alerts or anomalies. Respond promptly to alerts, assess the severity, and take appropriate actions to ensure service continuity.
Incident Triage and Resolution: Act as the first point of contact for all incidents related to service continuity. Quickly assess and triage incidents, escalating to appropriate teams if necessary, and drive them to resolution within defined SLAs.
Proactive Issue Identification: Proactively identify potential issues or areas of concern within the AWS cloud environment that could impact service continuity. Work closely with the engineering and operations teams to address these issues before they escalate.
Documentation and Knowledge Sharing: Maintain comprehensive documentation of incidents, resolutions, and best practices. Share knowledge and insights with the broader team to improve incident response and prevention processes.
Collaboration and Communication: Effectively collaborate with cross-functional teams, including engineering, operations, and security, to address service continuity challenges. Ensure clear and timely communication with stakeholders regarding incident status and resolution.
Continuous Improvement: Continuously seek opportunities to improve processes, tools, and monitoring capabilities to enhance service continuity in the AWS cloud environment. Actively participate in post-incident reviews to identify lessons learned and implement preventive measures.
Emergency Response: Be available for on-call rotations and respond to emergency situations outside of regular business hours when necessary to ensure the stability and availability of our services.
WHAT WE LOOK FOR:
- 1+ years of experience in any SRE / Service continuity team
- Basic understanding of cloud computing principles and AWS services.
- Excellent problem-solving and troubleshooting skills with the ability to remain calm under pressure.
- Familiarity with AWS troubleshooting, diagnostic tools, and utilities.
- Experience with incident management processes and tools.
- Experience in monitoring and alerting tools such as CloudWatch, Grafana, Prometheus, New Relic, OpenSerach or similar.
- Effective communication skills with the ability to convey technical information to both technical and non-technical stakeholders.
- Proven experience in a similar role, preferably in a cloud-based environment with a focus on AWS.
Bonus points if you also have:
- AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified SysOps Administrator)
- Bachelor's degree in computer science, engineering, or related field
#LI-IL1
About the company
BlueVine
- Top 10% of respondersBlueVine is in the top 10% of companies in terms of response time to applications
- Responds within two weeksBased on past data, BlueVine usually responds to incoming applications within two weeks
- B2B
- Scale StageRapidly increasing operations
- Top InvestorsThis company has received a significant amount of investment from top investors
- 4.4Highly ratedBlueVine is highly rated on Glassdoor, with 4.4 out of 5 stars
- 4.3Work / Life BalanceEmployees rate BlueVine 4.3/5 on Glassdoor for work / life balance
- 4.3Strong LeadershipEmployees rate BlueVine 4.3/5 on Glassdoor for faith in leadership