AI first OMICS platform powered by the world's largest longitudinal dataset in neurology

Early Stage
Startup in initial stages

Early Stage
Startup in initial stages

Senior Data Engineer

₹22L – ₹30L • No equity
|
Remote •
Bengaluru
|5 years of exp
|Full Time

Reposted: 5 months ago

Visa Sponsorship

Not Available

Remote Work Policy

Onsite or remote

Hires remotely in

India -

Bengaluru -

Delhi -

Pune

RelocationAllowed

Skills

Hadoop

noSQL

SQL

ETL

Data Management

Apache Spark

ElasticSearch

About the job

Company Description

Neurodiscovery is building an AI first OMICS platform that helps with research to discover cures for neurological conditions. The team is comprised of top neurologists, data scientists, data engineers and computational biologists to focus on finding cures for neurological conditions. We use large language models, knowledge graphs and topological methods to find unique patterns that lead to novel biomarkers, drug discovery and target identification. Our advisory board and investors include the top neurologists in the world, along with founders of multi-billion dollar health tech companies. You will get to work with a talented team.

Role Description

The Senior Data Engineer will be responsible for designing and maintaining complex data architecture, building data models, developing ETL pipelines, ensuring data quality, setting up data warehousing and maintaining the data infrastructure for a healthcare organisation. We will be dealing with terabytes of data, with data flowing in from multiple vendors. You will be responsible for managing the entire data infrastructure, occasionally collaborate with external stakeholders for data ingestion and work with MLOps engineers to design and maintain the platform for all business use cases.

Responsibilities

Design and develop data pipelines to ingest, transform, and load data from various sources
Develop and maintain data models and schemas, and implement data security and governance policies
Design and implement ETL processes using Hadoop, Hive, PySpark, SparkSQL and other relevant technologies, ensuring data quality, consistency, and reliability
Work closely with real world analysts to understand their requirements, provide necessary data, and guide them on data-related issues, ensuring alignment with business objectives.
Demonstrate strong proficiency in SQL queries, including outer joins, aggregations, unions, window functions, and common table expressions (CTEs) to manipulate and analyse complex datasets efficiently.
Monitor and troubleshoot data pipelines, manage the data warehouse on cloud infrastructure
Build and optimise spark/big data data pipelines, architectures and data sets involving petabytes and terabytes of healthcare data
Closely interact with cross functional leadership from within Neurodiscovery to identify the right open-source tools to deliver product features by performing research, POC/Pilot.

Qualifications