Avatar for Muoro
Building Engineering Teams and Talent In AI
  • B2C
  • B2B
  • Early Stage
    Startup in initial stages

Data Engineer

Reposted: 3 years ago
Job Location
Visa Sponsorship

Not Available

Remote Work Policy

In office

RelocationAllowed
Skills
SQL
ETL
Spark
Apache Spark
AWS
Pyspark
Airflow
Jypyter

About the job

Experience: 5+ years in Data Engineering

Key Skills: Cloud platforms (AWS, GCP), Spark, Data Lakehouse, Kubernetes, SQL, Apache, JupyterLab Notebook

Job Overview:

We are seeking a talented Spark Developer with strong expertise in SQL, Kubernetes, Apache Airflow, AWS, Data Lakehouse architecture, and data pipeline development. The ideal candidate will have hands-on experience with large-scale distributed data processing and cloud technologies, along with familiarity using JupyterLab-based notebooks for data analysis and reporting. This role is crucial in building and optimizing scalable, robust data workflows in our cloud-based ecosystem.

Key Responsibilities:

  • Spark Development: Design, develop, and maintain distributed data processing pipelines using Apache Spark to process large datasets in both batch and stream processing modes.
  • SQL & Data Transformation: Write complex SQL queries for data extraction, transformation, and aggregation. Work with both relational and non-relational databases to ensure efficient query execution and optimize performance.
  • Data Lakehouse & Cloud Architecture: Work with Data Lakehouse solutions (e.g., Delta Lake) on AWS to integrate structured and unstructured data into a unified platform for analytics and business intelligence.
  • AWS Integration: Leverage AWS services like S3, EMR, Glue, Redshift, Lambda, and others for data storage, processing, and orchestration. Build cloud-native data pipelines that are scalable and cost-effective.
  • Kubernetes for Orchestration: Deploy, scale, and manage data pipelines and Spark jobs using Kubernetes clusters. Utilize containerization for seamless deployment and management of the application lifecycle.
  • Workflow Automation with Apache Airflow: Create, schedule, and monitor data pipelines with Apache Airflow. Design DAGs (Directed Acyclic Graphs) to orchestrate and automate end-to-end data workflows.
  • JupyterLab-Based Notebooks: Develop, maintain, and optimize JupyterLab notebooks for interactive data analysis, visualizations, and reporting, supporting data scientists and analysts in their work.
  • Collaboration with Cross-Functional Teams: Work closely with data engineers, data scientists, business analysts, and other stakeholders to gather requirements, understand business needs, and build data solutions.
  • Data Quality and Performance Optimization: Ensure high-quality data pipelines, monitor job failures, and troubleshoot issues. Optimize performance by tuning Spark jobs, improving query performance, and resolving bottlenecks in data flows.
  • Documentation & Best Practices: Maintain clear documentation for data pipelines, architecture, and code. Follow best practices for version control, testing, and continuous integration/continuous delivery (CI/CD).

Required Skills & Experience:

  • Spark: Strong experience with Apache Spark (both PySpark and Spark SQL) for distributed data processing and optimization of jobs.
  • SQL: Proficiency in SQL for data wrangling, ETL (Extract, Transform, Load) processes, and performance tuning.
  • Cloud Platforms (AWS): Hands-on experience with AWS services (S3, EMR, Lambda, Glue, Redshift, etc.) for building scalable cloud data solutions.
  • Kubernetes: Experience deploying and managing containerized applications on Kubernetes

About the company

Muoro company logo
Building Engineering Teams and Talent In AI11-50 Employees
Company Size
11-50
Company Type
Big Data
Company Type
Artificial Intelligence
Company Type
Enterprise Software Company
Company Type
Software Development
Company Type
Business Analytics
  • B2C
  • B2B
  • Early Stage
    Startup in initial stages
Learn more about Muoro image