- Growth StageExpanding market presence
Data Engineer
- €38k – €56k • No equity
- 3 years of exp
- Full Time
Not Available
About the job
The Challenge
Software runs the universe and our developers write our company’s future. That’s why we are looking for a Data Engineer to take Sensorfact to the next level. Our customer base is growing, and so are our data volumes. Your challenge is to empower our data scientists, who develop forecasting and pattern recognition algorithms fed by the 20 million energy measurements we get each day. Can you design and build an infrastructure and scalable data pipelines to train machine learning models in distributed fashion? Are you excited about building streaming applications that can alert customers in real time of energy waste and machine failure? Your work will have a massive impact on the company: gaining insights from energy consumption data is the core of our value proposition.
What you will be doing
- You will be responsible for transforming our savings algorithms and machine learning models into production-ready applications to create lasting value.
- You have the opportunity to work with state of the art tools for machine learning operations (MLOps), serverless and event-driven architecture and cloud services in AWS.
- We are moving our extensive savings algorithm toolkit to a scalable microservices architecture. You will play a central role in designing and implementing our plan of attack.
- You will help to set up a platform to run machine learning experiments at scale, having the right data and compute available without impacting our ingestion, run customized models for our customers, and providing our domain experts with insightful tools.
- You will work closely with our Data Scientists to design robust and production-ready machine learning pipelines that continuously generate insights across our customer base. Additionally, you will work with our Backend and DevOps colleagues to ensure stable and scalable systems.
- Being part of a scale-up, you are proactive in prioritizing and solving the needs of our fast growing group of customers.
The key technologies you will be working with
As we are scaling up our platform with a small team, we leverage new technologies to keep performance and productivity. Right now our core platform is based on microservices written in Node.js connecting to the NATS message bus. Data is accessible through GraphQL APIs managed by Hasura. Time series data is stored raw in MongoDB, processed in InfluxDB and Postgres is our workhorse. Data analysis code is written in Python. We use Jupyterhub to experiment and interact with analytics models and present them to our in-house energy consultants. Our source code is on GitLab and we use a mix of GitLab CI and Jenkins for CI/CD.
How we do it
We do Scrum with 2-week sprints, sprint planning and retrospective sessions. Our stand-ups are at 9:30 and if you’re not there you can chime in over Meet. We keep track of things using Linear, Google Drive and Outline, and we stay in touch with each other over Slack. The course is determined by quarterly goals, set collaboratively by business, data science, development and product teams.
We know how important it is to get in the zone and write beautiful code so we schedule most meetings in the morning and keep the afternoon quiet (we try). We work from home about 70% of the time, but we enjoy meeting each other in the office regularly – covid allowing of course.
You are perfect for this job, because you…
- Have an MSc (or PhD) in Computer Science, Distributed Systems, Artificial Intelligence, or a comparable analytical / technical field;
- Are a medior (3+ years) data engineer who is fluent in creating cloud-based software applications with Python;
- Are fluent in professional software engineering practices (version control, merge requests, testing, code standards, CICD);
- Have experience with modern cloud and data technologies such as Spark, Kafka, Kubernetes, Docker, Jenkins, AWS Lambda;
- Have experience with deploying and maintaining data and machine learning pipelines at scale;
- Are passionate about one of the following (the more the better!): serverless and event-driven architectures, machine learning operations, stream processing, saving our climate, scale-up life;
- Have knowledge of modern database systems, preferably MongoDB, Postgres and/or Time Series databases (InfluxDB);
- Are fluent in English;
- (Bonus) Have knowledge of statistics and machine learning frameworks such as Tensorflow and Scikit-learn.
What we offer
A fulltime position (32-40hrs), money, pension, lunches, working from home, team activities, training budget – the usual. We work in a forward-thinking start-up culture with an energetic and engaged team, located around the corner of Utrecht Centraal. We’ll provide you with an NS-business card or cover your travel expenses to get there. We know how incredibly important it is to have the right tools. Any hardware or software you need to get your job done: great monitor, the best laptop, standing desk – you’ve got it.