Avatar for Vals AI
Benchmarking LLMs on Enterprise-Specific Tasks
  • Top 10% of responders
    Vals AI is in the top 10% of companies in terms of response time to applications
  • Responds within a week
    Based on past data, Vals AI usually responds to incoming applications within a week
  • B2B
  • +2

Member of Technical Staff

  • $125k – $175k • 0.2% – 0.5%
  • 2 years of exp
  • Full Time
Posted: 3 months ago• Recruiter recently active
Visa Sponsorship

Not Available

Remote Work Policy

In office

RelocationAllowed
Skills
Python
React.js
Large Language Models (LLMs)

About the job

About the Role:

In

Requirements:

  • A tenacity to iterate and develop quickly.

  • A significant portfolio of prior work (including work at jobs, but especially side projects).

  • Strong experience with Python, especially in production settings. For front-end development, experience with React.

  • Experience working in teams. This includes working in development sprints, knowledge of best practices in working with Git, reviewing pull requests.

  • Strong communication skills. You can provide input to others and equally receive/integrate feedback.

  • We are an in-person team, based in San Francisco. We will support your relocation or transportation as needed.

Nice to haves:

  • NLP research experience with papers published in reputable journals.

  • Experience working with Django or other Python-based HTTP servers (e.g. Flask).

  • Interest and familiarity with LLM infrastructure.

  • Experience working at other early-stage start-ups or your own company.

About Us

Measuring model ability is the most challenging part of creating applications that are capable of automating any given part of the economy. There are no good techniques or benchmarks for evaluating LLM performance on business-relevant tasks, so adoption for enterprise production settings has been limited (see Wittgenstein’s ruler).

This problem materializes in each place where LLMs have potential: in understanding whether the AI tool companies are building a product will satisfy a customer demand, determining how feasible models and vendors are for a given enterprise in making purchasing decisions, for researchers who need a north star to which to expand model ability.

Today, answering these questions amounts to hiring a human review team to manually evaluate model outputs. This is prohibitively expensive and slow.

Vals AI is building the enterprise benchmark of LLM and LLM apps on real-world business tasks. In doing so we are creating the infrastructure + certification to automatically audit LLM applications, verifying they are ready for consumption.

See our benchmarks and launch announcement in Bloomberg. We aim to build the barometer for whether AI is useful, and in doing so, accelerate the automation of all knowledge work.

What we are building:

Our core technology enables us to review + automatically audit LLM applications in high-value industries (legal, insurance, finance, healthcare). With this and our own data, we maintain a public benchmark of the major LLMs on enterprise tasks. Our success will be based on three components:

  1. Our evaluation performs at human-level accuracy on the relevant axes for each industry/application.

  2. Our platform has an intuitive interface that acts as a shared platform between human reviewers and engineers.

  3. We become the industry-standard benchmark, maintaining a loss-leading effort by publishing free reports and collaborating with credible data partners.

To achieve each of these, we are looking for machine learning engineers (Head of AI, Members of Technical Staff) to develop novel evaluation techniques, strong designers and front-end engineers (Founding Product Engineer) to contribute to the platform, and a tenacious operator to write reports and maintain our social media (email [email protected] if this is of interest).

What we offer:

  • Highly competitive salary and meaningful ownership. Excellence is well rewarded.

  • Relocation and transportation support.

  • Health/dental insurance coverage.

  • Lunch and dinner provided, free snacks/coffee/drinks.

  • Unlimited PTO.

About us:

Founding team: The core methodology behind this platform comes from NLP evaluation research we had done at Stanford. We raised a 5M seed from some of the top institutional and angel investors in the valley. Our team has prior work experience at NVIDIA, Meta, Microsoft, Palantir and HRT. Collectively, we have over 300 citations in our published work.

Tech stack: Our frontend is built in React with TSX. We use Django as our back-end framework. All of the infra is on AWS.

What we’re looking for:

  • Intelligence is more important than a good-looking resume. Industry experience and pedigree valuable only insofar as it is a proxy for talent itself.

  • Ownership to create products. We don’t have the scale or time to actively “manage” every project or task. Working in a small, talent-dense team, we expect everyone to show initiative to build where it’s needed, not where it’s asked. We strive for autonomy over consensus.

  • Intensity. The LLM landscape is constantly changing. Foundation model labs are continuously pushing the frontier, enterprises are seeing massive pressure to adopt technology, startups are hungry to chase the white space. The unicorn companies that will emerge from this technology shift are being built now. Those that win will have an incredibly high speed of execution.

  • See solutions not problems. We’re not looking for people that pass hard problems to others or admit defeat, but instead only see the opportunity to craft solutions at each juncture.

Further Reading:

Referral Bonus

Know someone who would be a good fit? Connect them with [email protected]. If we hire them and they stay on for 90 days you’ll get a $10,000 referral bonus and Vals AI merch!

About the company

Vals AI company logo
Benchmarking LLMs on Enterprise-Specific Tasks1-10 Employees
  • Top 10% of responders
    Vals AI is in the top 10% of companies in terms of response time to applications
  • Responds within a week
    Based on past data, Vals AI usually responds to incoming applications within a week
  • B2B
  • Early Stage
    Startup in initial stages
  • Recently funded
    Raised funding in the past six months
Learn more about Vals AI image

Funding

AMOUNT RAISED
$5M
FUNDED OVER
1 round
Round
S
$5,000,000
Seed - Jul 2024

Perks

Health Insurance
Dental Insurance
Substantial Equity Grants
Free Lunch and Dinner

Founders

Rayan Krishnan
CEO • 3 years
image
View the team image

Similar Jobs

Matroid company logo
Matroid
Computer Vision Made Simple
OpenBlock company logo
OpenBlock
OpenBlock is a verifiable data and modeling platform powered by zero-knowledge proofs
Sponsor a Pet company logo
Sponsor a Pet
We are a fundraising company for animal non-profits