Applied Research Engineer – Training Infra
Company: Snorkel AI
Location: San Francisco
Posted on: April 1, 2026
|
|
|
Job Description:
About Snorkel At Snorkel, we believe meaningful AI doesn’t start
with the model, it starts with the data. We’re on a mission to help
enterprises transform expert knowledge into specialized AI at
scale. The AI landscape has gone through incredible changes between
2015, when Snorkel started as a research project in the Stanford AI
Lab, to the generative AI breakthroughs of today. But one thing has
remained constant: the data you use to build AI is the key to
achieving differentiation, high performance, and production-ready
systems. We work with some of the world’s largest organizations to
empower scientists, engineers, financial experts, product creators,
journalists, and more to build custom AI with their data faster
than ever before. Excited to help us redefine how AI is built?
Apply to be the newest Snorkeler! THE ROLE As an Applied Research
Engineer at Snorkel AI, you will own the infrastructure that powers
our model training and evaluation work. This is a hands-on role
where you will build and operate GPU cluster infrastructure,
training pipelines, and the tooling that allows our research and
engineering teams to run experiments reliably and at scale. You
will work closely with research scientists and engineers,
translating training requirements into robust, reproducible
systems—and proactively removing infrastructure blockers before
they slow down the work that matters most. Snorkel AI operates in a
fast-paced, high-impact environment. We are looking for someone who
takes pride in operational excellence, loves solving complex
distributed systems problems, and thrives when given real
ownership. Location: Redwood City or San Francisco — OR REMOTE MAIN
RESPONSIBILITIES Set up and manage GPU cluster infrastructure on
major cloud providers (e.g., AWS HyperPod) for distributed model
training, including networking, provisioning, and cost tracking.
Build and operate job orchestration and scheduling systems (e.g.,
Kubernetes, Slurm, or cloud-native equivalents) to reliably launch
and manage training, rollout, and evaluation jobs across multi-node
clusters. Integrate and maintain ML training frameworks and
post-training pipelines, ensuring they run stably and reproducibly
at scale. Set up and maintain experiment tracking, dataset
versioning, and model artifact management to support fast
iteration. Monitor and optimize cluster health, inter-node
communication, and resource utilization; implement fault tolerance
and auto-recovery so long-running jobs survive node failures. Work
closely with research scientists and ML engineers to understand
requirements, unblock experiments, and evolve infrastructure as our
training workloads needs change. PREFERRED QUALIFICATIONS Hands-on
experience managing GPU clusters on major cloud providers,
including provisioning, network configuration, and cost management.
Experience with distributed compute orchestration tools such as
Kubernetes, Slurm, or equivalent cluster management systems.
Working knowledge of distributed training concepts: parallelism
strategies, memory optimization techniques, and inter-node
communication. Experience with setting up, managing, and
integrating ML experiment tracking and data/model versioning tools
Strong Python proficiency and solid software engineering
fundamentals such as version control, modular design, and
automation. Ability to work in a fast-moving, iterative environment
and take end-to-end ownership of ambiguous infrastructure problems.
Hands-on experience with post-training workflows such as supervised
fine-tuning (SFT) or reinforcement learning (RLHF, GRPO, or
similar) is a strong plus, but not required. The salary range is
$150,000.00 – $180,000.00. This role is a great fit for engineers
who love building reliable systems close to the frontier of AI
research. We welcome applicants from a wide range of
backgrounds—whether your experience comes from industry, research
labs, or direct hands-on work with distributed infrastructure at
scale. BE YOUR BEST AT SNORKEL Joining Snorkel AI means becoming
part of a company that has market proven solutions, robust funding,
and is scaling rapidly—offering a unique combination of stability
and the excitement of high growth. As a member of our team, you’ll
have meaningful opportunities to shape priorities and initiatives,
influence key strategic decisions, and directly impact our ongoing
success. Whether you’re looking to deepen your technical expertise,
explore leadership opportunities, or learn new skills across
multiple functions, you’re fully supported in building your career
in an environment designed for growth, learning, and shared
success. Snorkel AI is proud to be an Equal Employment Opportunity
employer and is committed to building a team that represents a
variety of backgrounds, perspectives, and skills. Snorkel AI
embraces diversity and provides equal employment opportunities to
all employees and applicants for employment. Snorkel AI prohibits
discrimination and harassment of any type on the basis of race,
color, religion, age, sex, national origin, disability status,
genetics, protected veteran status, sexual orientation, gender
identity or expression, or any other characteristic protected by
federal, state, or local law. All employment is decided on the
basis of qualifications, performance, merit, and business need. We
will ensure that individuals with disabilities are provided
reasonable accommodation to participate in the job application or
interview process, to perform essential job functions, and to
receive other benefits and privileges of employment. Please contact
us to request accommodation. Salary Range $150,000 - $180,000 USD
Be Your Best at Snorkel Joining Snorkel AI means becoming part of a
company that has market proven solutions, robust funding, and is
scaling rapidly—offering a unique combination of stability and the
excitement of high growth. As a member of our team, you’ll have
meaningful opportunities to shape priorities and initiatives,
influence key strategic decisions, and directly impact our ongoing
success. Whether you’re looking to deepen your technical expertise,
explore leadership opportunities, or learn new skills across
multiple functions, you’re fully supported in building your career
in an environment designed for growth, learning, and shared
success. Snorkel AI is proud to be an Equal Employment Opportunity
employer and is committed to building a team that represents a
variety of backgrounds, perspectives, and skills. Snorkel AI
embraces diversity and provides equal employment opportunities to
all employees and applicants for employment. Snorkel AI prohibits
discrimination and harassment of any type on the basis of race,
color, religion, age, sex, national origin, disability status,
genetics, protected veteran status, sexual orientation, gender
identity or expression, or any other characteristic protected by
federal, state, or local law. All employment is decided on the
basis of qualifications, performance, merit, and business need. We
will ensure that individuals with disabilities are provided
reasonable accommodation to participate in the job application or
interview process, to perform essential job functions, and to
receive other benefits and privileges of employment. Please contact
us to request accommodation.
Keywords: Snorkel AI, Oakland , Applied Research Engineer – Training Infra, IT / Software / Systems , San Francisco, California