Machine Learning Engineer - Inference
Company: Together AI
Location: San Francisco
Posted on: April 2, 2026
|
|
|
Job Description:
About the Role Together AI is seeking a Machine Learning
Engineer to join our Inference Engine team, focusing on optimizing
and enhancing the performance of our AI inference systems. This
role involves working with state-of-the-art large language models
models and ensuring they run efficiently and effectively at scale.
If you are passionate about AI inference, PyTorch, and developing
high-performance systems, we want to hear from you. This position
offers the chance to collaborate closely with AI researchers and
engineers to create cutting-edge AI solutions. Join us in shaping
the future at Together AI! Responsibilities Design and build the
production systems that power the Together AI inference engine,
enabling reliability and performance at scale. Develop and optimize
runtime inference services for large-scale AI applications.
Collaborate with researchers, engineers, product managers, and
designers to bring new features and research capabilities to the
world. Conduct design and code reviews to ensure high standards of
quality. Create services, tools, and developer documentation to
support the inference engine. Implement robust and fault-tolerant
systems for data ingestion and processing. Requirements 3 years of
experience writing high-performance, well-tested,
production-quality code. Proficiency with Python and PyTorch.
Demonstrated experience in building high performance libraries and
tooling. Excellent understanding of low-level operating systems
concepts including multi-threading, memory management, networking,
storage, performance, and scale. Preferred: Knowledge of existing
AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum
Preferred: Knowledge of AI inference techniques such as speculative
decoding. Preferred: Knowledge of CUDA/Triton programming. Nice to
have: Knowledge of Rust, Cython and compilers. About Together AI
Together AI is a research-driven artificial intelligence company.
We believe open and transparent AI systems will drive innovation
and create the best outcomes for society. Together, we are on a
mission to significantly lower the cost of modern AI systems by
co-designing software, hardware, algorithms, and models. We have
contributed to leading open-source research, models, and datasets
to advance the frontier of AI. Our team has been behind
technological advancements such as FlashAttention, Hyena, FlexGen,
and RedPajama. We invite you to join a passionate group of
researchers and engineers in our journey to build the
next-generation AI infrastructure. Compensation We offer
competitive compensation, startup equity, health insurance, and
other competitive benefits. The US base salary range for this
full-time position is $160,000 - $230,000 equity benefits. Our
salary ranges are determined by location, level, and role.
Individual compensation will be determined by experience, skills,
and job-related knowledge. Equal Opportunity Together AI is an
Equal Opportunity Employer and is proud to offer equal employment
opportunities to everyone regardless of race, color, ancestry,
religion, sex, national origin, sexual orientation, age,
citizenship, marital status, disability, gender identity, veteran
status, and more. Please see our privacy policy at
https://www.together.ai/privacy
Keywords: Together AI, Oakland , Machine Learning Engineer - Inference, IT / Software / Systems , San Francisco, California