Lead Data Engineer

Remote •
South Pasadena
|6 years of exp
|Full Time

Posted: 3 years ago

Visa Sponsorship

Not Available

Hires remotely in

California -

North America -

United States

RelocationAllowed

Skills

Python

Java

MongoDB

Scala

MSSQL

Snowflake

Microsoft SQL Server

Kafka

Amazon Redshift

Amazon Kinesis

Druid

Kubernetes

Hadoop/Hive/Spark/Scala/MLlib

Apache/Spark/Databricks

Flink

Apache Airflow

Apache Pulsar

About the job

We are looking for an individual that will bring in his/her expertise in a wide variety of big data processing frameworks (both open source and proprietary), large scale database systems (Big Data, OLAP and OLTP), stream data processing, API Development, Machine learning operationalization, and cloud automation to build and support all the data needs across our data platform.

Responsibilities

Design and develop the data platform to efficiently and cost effectively address various data needs across the business.
Build software across our entire cutting-edge data platform, including event driven data processing, storage, and serving through scalable and highly available APIs, with awesome cutting-edge technologies.
Ensure performance isn’t our weakness by implementing and refining robust data processing, REST services, RPC (in an out of HTTP), and caching technologies.
Build process and tools to maintain Machine Learning pipelines in production.
Develop and enforce data engineering, security, data quality standards through automation.
Participate in supporting the data platforms 24X7.

Qualification

Bachelor’s degree in computer science or Similar discipline.
6+ years of experience in software engineering
3+ years of experience in data engineering.
Ability to work in fast paced, high pressure, agile environment and willingness to learn any new technologies and apply them at work in order to stay ahead of the curve.
Expertise in at least few programming languages Java, Scala, Python or similar.
Expertise in building and managing large volume data processing (both streaming and batch) platform is a must.
Expertise in stream processing systems such as Kafka, Kinesis, Pulsar or Similar
Expertise in building micro services and managing containerized deployments, preferably using Kubernetes
Expertise in distributed data processing frameworks such as Apache Spark, Databricks, Flink or Similar.
Expertise in SQL, Spark SQL, Hive etc.
Expertise in OLAP databases such as MSSQL, Snowflake or Redshift.
No-SQL (MongoDB or similar) is a plus
Experience in operationalizing and scaling machine models is a huge plus.
Experience with variety of data Tools & frameworks (example: Apache Airflow, Druid) will be a huge plus.
Strong interpersonal, communication and presentation skills.
Strong team focus with outstanding organizational and resource management skills