Software Engineer, GPU Infrastructure

Company: OpenAI
Location: San Francisco
Posted on: May 4, 2025

Job Description:

Software Engineer, GPU Infrastructure - OpenAICareersSoftware Engineer, GPU InfrastructureScaling - San FranciscoApply now (opens in a new window)This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world's largest, most reliable, and frictionless GPU fleet to support OpenAI's general purpose model training and deployment. Work on this team ranges from

Maximizing GPUs doing useful work by building user-friendly scheduling and quota systems
Running a reliable and low maintenance platform by building push-button automation for kubernetes cluster provisioning and upgrades
Supporting research workflows with service frameworks and deployment systems
Ensuring fast model startup times though high performance snapshot delivery across blob storage down to hardware caching
Much more!About the RoleAs an engineer within Fleet infrastructure, you will design, write, deploy, and operate infrastructure systems for model deployment and training on one of the world's largest GPU fleet. The scale is immense, the timelines are tight, and the organization is moving fast; this is an opportunity to shape a critical system in support of OpenAI's mission to advance AI capabilities responsibly.This role is based in San Francisco, CA. We use a hybrid work model of 3 days in the office per week and offer relocation assistance to new employees.In this role, you will:
- Design, implement and operate components of our compute fleet including job scheduling, cluster management, snapshot delivery, and CI/CD systems.
- Interface with researchers and product teams to understand workload requirements
- Collaborate with hardware, infrastructure, and business teams to provide a high utilization and high reliability serviceYou might thrive in this role if you:
  - Have experience with hyperscale compute systems
  - Possess strong programming skills
  - Have experience working in public clouds (especially Azure)
  - Have experience working in Kubernetes
  - Execution focused mentality paired with a rigorous focus on user requirements
  - As a bonus, have an understanding of AI/ML workloadsAbout OpenAIOpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.OpenAI Affirmative Action and Equal Employment Opportunity Policy StatementFor US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via thislink .OpenAI Global Applicant Privacy PolicyAt OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.Compensation$325K - $590K + Offers EquityApply now (opens in a new window)
    #J-18808-Ljbffr

Keywords: OpenAI, Oakland , Software Engineer, GPU Infrastructure, IT / Software / Systems , San Francisco, California

Click here to apply!

Didn't find what you're looking for? Search again!

Let San Francisco recruiters find you. Post your resume for free!

Get San Francisco IT / Software / Systems jobs via email.

View more Oakland IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Information Security Analyst-SQL/ Python Data Reporting
Description: You Lead the Way. We've Got Your Back.With the right backing, people and businesses have the power to progress in incredible ways. When you join Team Amex, you become part of a global and diverse community (more...)
Company: American Express
Location: Palo Alto
Posted on: 05/5/2025

Staff Software Engineer, Anti Fraud & Abuse
Description: At Groq, we believe AI will change humanity forever, and that making it affordable and universally accessible is the key to human agency in an AI economy. We're assembling a team of world-class engineers (more...)
Company: Groq Inc.
Location: Palo Alto
Posted on: 05/5/2025

Software Engineer, Integration, AI Platforms
Description: Software Engineer, Integration, AI PlatformsTesla is a leader in innovative technology, pioneering advancements in autonomous vehicles and humanoid robotics. Our cutting-edge AI platform powers some of (more...)
Company: Tesla, Inc.
Location: Palo Alto
Posted on: 05/5/2025

Salary in Oakland, California Area | More details for Oakland, California Jobs |Salary

Sr. Software Engineer, Design Technology
Description: Sr. Software Engineer, Design TechnologyJob Category: LocationReq. ID: 239755Job Type: Full-timeWhat to ExpectTesla's Vehicle Engineering department is composed of thousands of the world's best vehicle, (more...)
Company: Tesla, Inc.
Location: Palo Alto
Posted on: 05/5/2025

Sr. Software Developer
Description: Stanford University is seeking an experienced Sr. Software Developer with a strong passion for biomedical informatics and advancing healthcare through the power of AI. This role involves leading the integration (more...)
Company: Stanford Blood Center
Location: Palo Alto
Posted on: 05/5/2025

Software QA Engineer
Description: Electron Microscopy Technologies EMT is the world's leading manufacturer of high-quality instrumentation and software for imaging and analysis in TEM and SEM applications. The Gatan and EDAX names are (more...)
Company: AMETEK, Inc.
Location: Pleasanton
Posted on: 05/5/2025

Software Engineer - Embedded Linux and Yocto Development
Description: Software Engineer - Embedded Linux and Yocto Development br Company: Qualitest Group br Country/Region: US br Are you interested in working with the World's leading AI-powered Quality Engineering (more...)
Company: Olenick
Location: Palo Alto
Posted on: 05/5/2025

Staff Software Engineer - Data infrastructure
Description: We are looking for people with strong Backend Data Engineering capabilities to build highly efficient, resilient systems pipelines for large-scale data processing. You'll be part of Luma's applied research (more...)
Company: Luma AI
Location: Palo Alto
Posted on: 05/5/2025

Associate Software Engineer - 2025 Start Dates
Description: Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. As one of the fastest-growing SaaS companies in history, (more...)
Company: Veeva Systems, Inc.
Location: Pleasanton
Posted on: 05/5/2025

Staff Software Engineer
Description: time left to apply End Date: August 31, 2025 30 days left to apply job requisition id JR100468Uniphore is one of the largest B2B AI-native companies-decades-proven, built-for-scale and designed for (more...)
Company: Uniphore
Location: Palo Alto
Posted on: 05/5/2025

Loading more jobs...

Software Engineer, GPU Infrastructure

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account