- Early StageStartup in initial stages
Senior Data Engineer
- ₹22L – ₹30L • No equity
- Remote •
- 5 years of exp
- Full Time
About the job
Company Description
Neurodiscovery is building an AI first OMICS platform that helps with research to discover cures for neurological conditions. The team is comprised of top neurologists, data scientists, data engineers and computational biologists to focus on finding cures for neurological conditions. We use large language models, knowledge graphs and topological methods to find unique patterns that lead to novel biomarkers, drug discovery and target identification. Our advisory board and investors include the top neurologists in the world, along with founders of multi-billion dollar health tech companies. You will get to work with a talented team.
Role Description
The Senior Data Engineer will be responsible for designing and maintaining complex data architecture, building data models, developing ETL pipelines, ensuring data quality, setting up data warehousing and maintaining the data infrastructure for a healthcare organisation. We will be dealing with terabytes of data, with data flowing in from multiple vendors. You will be responsible for managing the entire data infrastructure, occasionally collaborate with external stakeholders for data ingestion and work with MLOps engineers to design and maintain the platform for all business use cases.
Responsibilities
- Design and develop data pipelines to ingest, transform, and load data from various sources
- Develop and maintain data models and schemas, and implement data security and governance policies
- Design and implement ETL processes using Hadoop, Hive, PySpark, SparkSQL and other relevant technologies, ensuring data quality, consistency, and reliability
- Work closely with real world analysts to understand their requirements, provide necessary data, and guide them on data-related issues, ensuring alignment with business objectives.
- Demonstrate strong proficiency in SQL queries, including outer joins, aggregations, unions, window functions, and common table expressions (CTEs) to manipulate and analyse complex datasets efficiently.
- Monitor and troubleshoot data pipelines, manage the data warehouse on cloud infrastructure
- Build and optimise spark/big data data pipelines, architectures and data sets involving petabytes and terabytes of healthcare data
- Closely interact with cross functional leadership from within Neurodiscovery to identify the right open-source tools to deliver product features by performing research, POC/Pilot.
Qualifications
- 5+ years of experience in data engineering
- Experience with Extract Transform Load (ETL) and Data Warehousing
- Experience with SQL, Elasticsearch, and data integration tools such as Talend
- Experience in working with healthcare data, including EHR, HIMS, PACS, and RIS
- Strong SQL and programming in Python or Java
- Experience with Big Data technologies such as Hadoop, Spark, or NoSQL
- Experience with AWS or Azure is desirable
- Bachelor's or Master's degree in Computer Science or related field
- Prior experience in the healthcare, pharma, or biotech industry is desirable
About the company
- Early StageStartup in initial stages