- B2C
- B2B
- Early StageStartup in initial stages
Data Engineer
- ₹30L – ₹40L • No equity
- Pune •+2
- 5 years of exp
- Full Time
Reposted: 3 years ago
Job Location
Visa Sponsorship
Not Available
Remote Work Policy
In office
RelocationAllowed
Skills
SQL
ETL
Spark
Apache Spark
AWS
Pyspark
Airflow
Jypyter
About the job
Experience: 5+ years in Data Engineering
Key Skills: Cloud platforms (AWS, GCP), Spark, Data Lakehouse, Kubernetes, SQL, Apache, JupyterLab Notebook
Job Overview:
We are seeking a talented Spark Developer with strong expertise in SQL, Kubernetes, Apache Airflow, AWS, Data Lakehouse architecture, and data pipeline development. The ideal candidate will have hands-on experience with large-scale distributed data processing and cloud technologies, along with familiarity using JupyterLab-based notebooks for data analysis and reporting. This role is crucial in building and optimizing scalable, robust data workflows in our cloud-based ecosystem.
Key Responsibilities:
- Spark Development: Design, develop, and maintain distributed data processing pipelines using Apache Spark to process large datasets in both batch and stream processing modes.
- SQL & Data Transformation: Write complex SQL queries for data extraction, transformation, and aggregation. Work with both relational and non-relational databases to ensure efficient query execution and optimize performance.
- Data Lakehouse & Cloud Architecture: Work with Data Lakehouse solutions (e.g., Delta Lake) on AWS to integrate structured and unstructured data into a unified platform for analytics and business intelligence.
- AWS Integration: Leverage AWS services like S3, EMR, Glue, Redshift, Lambda, and others for data storage, processing, and orchestration. Build cloud-native data pipelines that are scalable and cost-effective.
- Kubernetes for Orchestration: Deploy, scale, and manage data pipelines and Spark jobs using Kubernetes clusters. Utilize containerization for seamless deployment and management of the application lifecycle.
- Workflow Automation with Apache Airflow: Create, schedule, and monitor data pipelines with Apache Airflow. Design DAGs (Directed Acyclic Graphs) to orchestrate and automate end-to-end data workflows.
- JupyterLab-Based Notebooks: Develop, maintain, and optimize JupyterLab notebooks for interactive data analysis, visualizations, and reporting, supporting data scientists and analysts in their work.
- Collaboration with Cross-Functional Teams: Work closely with data engineers, data scientists, business analysts, and other stakeholders to gather requirements, understand business needs, and build data solutions.
- Data Quality and Performance Optimization: Ensure high-quality data pipelines, monitor job failures, and troubleshoot issues. Optimize performance by tuning Spark jobs, improving query performance, and resolving bottlenecks in data flows.
- Documentation & Best Practices: Maintain clear documentation for data pipelines, architecture, and code. Follow best practices for version control, testing, and continuous integration/continuous delivery (CI/CD).
Required Skills & Experience:
- Spark: Strong experience with Apache Spark (both PySpark and Spark SQL) for distributed data processing and optimization of jobs.
- SQL: Proficiency in SQL for data wrangling, ETL (Extract, Transform, Load) processes, and performance tuning.
- Cloud Platforms (AWS): Hands-on experience with AWS services (S3, EMR, Lambda, Glue, Redshift, etc.) for building scalable cloud data solutions.
- Kubernetes: Experience deploying and managing containerized applications on Kubernetes
About the company
11-50
Big Data
Artificial Intelligence
Enterprise Software Company
Software Development
Business Analytics
- B2C
- B2B
- Early StageStartup in initial stages
Employees joined from
Similar Jobs
AcquiredLang
A new computer language
CanvasJS
Beautiful HTML5 & JavaScript Charts
Apeiron Mobility
Sensible mobility for the world
Indihood
In Community We Trust
Harivara
Your Vedic Partner !
SellerApp
Building Ecommerce 3.0 with Data Analytics
Refrens
Market Network for B2B service providers
Digit88
Empowering digital transformation as a trusted software product engineering partner!
Brand Impetus Design Studio
Design subscriptions to scale your business