Site Reliability Engineer
- 0.13% – 0.3%
- Remote •
- 3 years of exp
- Full Time
Posted: 2 years ago
Visa Sponsorship
Available
Hires remotely in
RelocationAllowed
Hiring contact
Allison DiFilippo
About the job
Roles and Responsibilities
- Ensure the scalability and reliability of the Zenith Cloud
- Co own the Continuous Integration & Continuous Delivery with the developers
- Set up, maintain, and democratize the observability tooling for metrics/traces/logs collection, storage, visualization
- Steer the team towards the efficient on-call setup that everyone can participate in
- Collaborate with engineering on the zero downtime implementation
- Build the chaos engineering practice & efficient incident retrospection process
- Suggest architecture improvements and recommend process improvements
- Enable developers and other technical roles for the autonomous work and to do ops and reliability engineering
We’re looking for someone who has
- Strong hands-on experience with Cloud Platforms (AWS) and managing massive k8s or nomad deployments
- A proven experience in a complex Linux infrastructure environment
- A previous experience or running the databases, data lakes, or data management platforms, as well as the full stack applications
- A habit of keeping alerting & incident response actionable and relevant, both to the business needs and team sustainability
- A mix of pragmatic operational and software engineering skills, a passion for the operational discipline, automation, documentation, and bus factor risks avoidance
- Proficiency with most of the following services, technologies, and concepts: Python, go, rust, Postgres, k8s, AWS, Ansible, GitHub, CI, Observability.
About the company
Similar Jobs
Everlance
(1) Automatic mileage & expense tracking (2) Powering the future of work
Tokensoft
Delivering integrity to the financial markets by automating finance
Bizwise
Build your business, we'll handle the rest
KYC Hospitality
Enterprise Software for Hotels
AngelList
We're the world’s largest startup community. We help startups change the world
Fieldguide
Powering the future of trust with software for modern assurance & advisory firms
Uncountable
Accelerating R&D via Machine Learning
Flow Labs
We’re making cleaner, clearer, safer roads for everyone — right now
Finch
Unifying payroll, HR, and benefits under a single API