Avatar for Luma AI
Lifelike 3D captures
  • Top 10% of responders
    Luma AI is in the top 10% of companies in terms of response time to applications
  • Responds within two weeks
    Based on past data, Luma AI usually responds to incoming applications within two weeks
  • Growth Stage
    Expanding market presence
  • +1

Senior Research Engineer- Performance Optimization

Posted: 5 months ago
Visa Sponsorship

Not Available

RelocationAllowed

About the job

We are looking for engineers with significant problem solving experience in PyTorch, CUDA and distributed systems. You will work with Research Scientists to build & train cutting edge foundation models on thousands of GPUs.

Responsibilities

  • Ensure efficient implementation of models & systems for data processing, training, inference and deployment
  • Identify and implement optimization techniques for massively parallel and distributed systems
  • Identify and remedy efficiency bottlenecks (memory, speed, utilization) by profiling and implementing high-performance CUDA, Triton, C++ and PyTorch code
  • Work closely together with the research team to ensure systems are planned to be as efficient as possible from start to finish
  • Build tools to visualize, evaluate and filter datasets
  • Implement cutting-edge product prototypes based on multimodal generative AI

Experience

  • Experience training large models using Python & Pytorch, including practical experience working with the entire development pipeline from data processing, preparation & data loading to training and inference.
  • Experience optimizing and deploying inference workloads for throughput and latency across the stack (inputs, model inference, outputs, parallel processing etc.)
  • Experience with profiling CPU & GPU code in PyTorch, including Nvidia Nsight or similar.
  • Experience writing & improving highly parallel & distributed PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel, etc.
  • Experience writing high-performance parallel C++. Bonus if done within an ML context with PyTorch, like for data loading, data processing, inference code.
  • Experience with high-performance Triton / CUDA and writing custom PyTorch kernels. Top candidates will be able to utilize tensor cores; optimize performance with CUDA memory and other similar skills.
  • Good to have experience working with Deep learning concepts such as Transformers & Multimodal Generative models such as Diffusion Models and GANs.
  • Good to have experience building inference / demo prototype code (incl. Gradio, Docker etc.)
  • Please note this role is not meant for recent grads.

Compensation

  • *The pay range for this position in California is $180,000 - $250,000yr; however, base pay offered may vary depending on job-related knowledge, skills, candidate location, and experience. We also offer competitive equity packages in the form of stock options and a comprehensive benefits plan. *

Your applications are reviewed by real people.

About the company

Luma AI company logo
Lifelike 3D captures11-50 Employees
  • Top 10% of responders
    Luma AI is in the top 10% of companies in terms of response time to applications
  • Responds within two weeks
    Based on past data, Luma AI usually responds to incoming applications within two weeks
  • Growth Stage
    Expanding market presence
  • Top Investors
    This company has received a significant amount of investment from top investors
Learn more about Luma AI image

Funding

AMOUNT RAISED
$20M
FUNDED OVER
1 round
Round
A
$20,000,000
Series A - Mar 2023

Founders

Alberto Taiuti
Founder • 3 years • 3 years
San Francisco
image
Amit Jain
Founder • 3 years • 3 years
United States
image
View the team image

Similar Jobs

OpenBlock company logo
OpenBlock
OpenBlock is a verifiable data and modeling platform powered by zero-knowledge proofs
Sponsor a Pet company logo
Sponsor a Pet
We are a fundraising company for animal non-profits
Typeface company logo
Typeface
GenAI early stage start up backed by top investors
Assured company logo
Assured
Automated claims is now a reality