Site Reliability Engineering (SRE) Lead
- €70k – €90k
- Remote •
- 7 years of exp
- Full Time
Not Available
Onsite or remote
About the job
About Us:
At Forward Earth, we are dedicated to making sustainability and decarbonization core elements of every business. Our innovative software equips partners with carbon management tools that simplify compliance, reduce emissions, and bolster success in a sustainable economy. We are committed to reducing emissions across products and supply chains, helping combat global climate change. Our goal is to automate carbon management, making it accessible and scalable, empowering users to optimize processes and achieve substantial emission reductions.
We’re looking for:
Problem solvers who thrive on big-picture thinking and crafting elegant solutions. We value diversity, inclusion, and a collaborative culture rooted in respect. At Forward Earth, we set ambitious goals, support one another, and uphold excellence in all our endeavours.
Role Overview:
We are seeking an experienced and driven Site Reliability Engineering (SRE) Lead to build and manage a high-performing SRE team across Europe and the US. You will be instrumental in ensuring the reliability, scalability, and performance of our critical carbon management platform. This role is based in Berlin, in a hybrid working environment, and will involve hands-on work, particularly in the initial phase of team development.
Responsibilities:
Team Leadership: Build, mentor, and manage a geographically distributed team of SREs. Foster a culture of collaboration, knowledge sharing, and continuous improvement.
Reliability Engineering: Implement and maintain monitoring, alerting, and logging systems to proactively identify and address potential issues. Develop and enforce SLOs (Service Level Objectives) and error budgets.
Incident Response: Lead incident response efforts, conduct root cause analysis, and implement preventative measures to minimize future occurrences.
Performance Optimization: Analyze system performance, identify bottlenecks, and implement optimizations to improve efficiency and scalability.
Automation: Champion automation to reduce manual effort and improve operational efficiency. Develop and maintain tools and scripts for automating tasks such as deployment, scaling, and incident response.
Collaboration: Work closely with development teams to ensure that reliability and performance are considered throughout the software development lifecycle.
Cloud Infrastructure: Manage and optimize our AWS cloud infrastructure, ensuring cost-effectiveness and security.
The ideal candidate has…
7+ years of experience in SRE, DevOps, or a related field.
Proven experience in leading and managing SRE teams.
Strong understanding of cloud infrastructure, preferably AWS, including services such as EC2, S3, Lambda, and CloudWatch.
Experience with CloudFormation is ideal, but strong skills with other IaC tools will be considered.
Proficiency with python and bash scripting.
Experience with monitoring and observability tools (e.g., Prometheus, Honeycomb, Grafana, Datadog).
Strong understanding of CI/CD pipelines and best practices.
Excellent communication, interpersonal, and problem-solving skills.
Professional-level English and eligibility to work in Germany.
You are already in Berlin and able to start working with us soon!
Bonus Points:
Experience with containerization technologies (e.g., Docker, Kubernetes).
Experience with database administration (e.g., PostgreSQL, MySQL).
Familiarity with security best practices and compliance frameworks.
What We Offer:
A leadership role in a company dedicated to advancing carbon management technology and making a positive impact on the environment.
Hybrid / remote-flexible work environment, where we are based in Berlin but enjoy a "hybrid" / remote-first way of working.
Significant growth potential within a rapidly expanding company.
A dynamic, supportive work environment that champions innovation, initiative, and diversity.