DriveWealth
Actively Hiring
The Modern Brokerage Platform
- B2B
- Scale StageRapidly increasing operations
- Top InvestorsThis company has received a significant amount of investment from top investors
- +1
Manager, Site Reliability Engineering
- Remote •
- Full Time
Posted: 2 months ago
About the job
About the Role
As a Manager of Site Reliability Engineering based in Lithuania, you will oversee our Brokerage-as-a-Service platform's reliability and operational efficiency during critical end-of-day and start-of-day operations. You will lead a team that is an extension of our New York office, ensuring seamless integration and continuous operational coverage across time zones.
What You’ll Do
- Lead the Site Reliability Engineering team to enhance support workflows using ticketing systems and tools
- Manage and mentor a team of SREs, facilitating collaboration between internal product, engineering, and client-facing teams
- Oversee partner escalations and ensure operational stability, including monitoring partner channels, troubleshooting issues, and coordinating with partners on remediation
- Adhere to DriveWealth Incident Management Policy for the resolution and documentation with the engineering team for ongoing product improvements
- Oversee the entire incident response lifecycle
- Administer DriveWealth Change Management Policy to ensure minimal disruption to services
- Collaborate closely with the SRE team and other teams operating in the Eastern Standard Time zone to align on strategic initiatives and daily operations, ensuring adherence to global standards and practices. This includes monitoring critical operations outside conventional hours to support our global platform’s scalability and reliability
What You’ll Need
- 5+ years of experience in software engineering or site reliability engineering
- 3+ years of proven leadership experience and the ability to manage teams
- Working knowledge of REST APIs and experience with JIRA Service Desk and Confluence
- Flexibility to cover and manage the on-call responsibility
- Expertise in incident and change management processes
- Knowledge of alerting and automation frameworks
- Develop and maintain KPI reporting metrics
- Availability for flexible work hours and willingness to cover US markets trading sessions
Preferred but not required
- Strong background in technical cloud services, particularly AWS, including expertise with IAM, EC2, S3, and DynamoDB
- Experience with Infrastructure as Code (IAC) tools like Terraform, CloudFormation
- Experience with job orchestrator/scheduler tools like Apache Airflow, Rundeck
- Experience maintaining and supporting containerized systems using Kubernetes and OpenShift
- Knowledge of Confluent Cloud for managing Kafka streams in a production environment
- Scripting capability in Python or similar languages
- Experience with SQL and transactional database querying
About the company
- B2B
- Scale StageRapidly increasing operations
- Top InvestorsThis company has received a significant amount of investment from top investors
- Valuation $1B+This company has a valuation of $1B or more
Similar Jobs
Corelight
We provide the most valuable data for protecting the world's networks