- B2B
- Scale StageRapidly increasing operations
- Top InvestorsThis company has received a significant amount of investment from top investors
- +4
Sr. System Reliability Engineer
- Full Time
Not Available
Brendan Lynch
About the job
About the role
Please note, this team is hiring across all levels and candidates are individually assessed and appropriately leveled based upon their skills and experience.
The Product and Performance Engineering (PPE) Team ensures the availability and performance of Netskope’s applications, particularly in the area of end-user experience. This team is a post-incident escalation point for issues where the root cause is not immediately clear, or it has been determined that more than one service component (infrastructure or application) contributed to overall impairment. This team owns the determination of the root cause in such cases. Typically, the individual assigned to a specific issue will build a “tiger team” of individuals from across the company who have deep knowledge in a particular area and coordinate activities between these individuals to form and execute on a unified plan. The PPE team is ultimately responsible for the outcome (resolution) of the issue.
What’s in it for you
PPE is seeking a production service-oriented, self-driven, and motivated Infrastructure SRE to join the team and help to build out our existing infrastructure and troubleshoot problems as they arise, ensuring the highest levels of systems and infrastructure availability of Netskope’s production services. You will also be responsible for integrating services health metrics, identifying/measuring these service health indicators and providing creative tool sets for the frontline operations support teams.
Required skills and experience
- A minimum of 5 - 7 years of experience working in a production data center environment with 1000+ servers
- Experience troubleshooting complex issues and correlating data from multiple sources such as service applications, linux systems and the network.
- Deep knowledge of metrics platforms such as Prism, Prometheus, Grafana, Graphite, Sumo Logic etc, and expertise in the collection, analysis and correlation of metrics.
- The ability to deep dive into network troubleshooting areas such as packet analysis, HTTP/HTTPs, tunneling protocol, load balancer issues, etc.
- A comprehensive understanding of computer internals and architectures, and experience maintaining common Linux/Unix applications and services.
- Experience with modern cloud and virtualization technologies such as Docker, Kubernetes, AWS, GCP, KVM, OpenNebula, OpenStack or other orchestration platforms.
- Strong software development skills using Python, C, C++, Go, etc.
- Deep expertise with operational support systems, automation, and CI/CD tools.
- A demonstrated ability and willingness to act as subject matter expert, tracking technology/industry trends, and providing data-driven reasoning for technology path recommendations.
Education
- Bachelor's degree preferred
#LI-SK1
About the company
- B2B
- Scale StageRapidly increasing operations
- Top InvestorsThis company has received a significant amount of investment from top investors
- Valuation $1B+This company has a valuation of $1B or more
- 4.2Highly ratedNetskope is highly rated on Glassdoor, with 4.2 out of 5 stars
- 4.1Work / Life BalanceEmployees rate Netskope 4.1/5 on Glassdoor for work / life balance
- 4.1Strong LeadershipEmployees rate Netskope 4.1/5 on Glassdoor for faith in leadership