bluedoor data·Job Postings API·bluedoor.sh ↗

HomeCompaniesPhysicalintelligenceML Infra Engineer

ML Infra Engineer

Physicalintelligence · San Francisco · On Site · Active · Ashby

Job facts

FieldValue
CompanyPhysicalintelligence
TitleML Infra Engineer
Normalized title-
Department / teamMachine Learning / Machine Learning
LocationSan Francisco, CA, United States
Work modelOn Site
Employment typeFull Time
Salary-
Statusactive
ATS providerAshby
Posted / first seen / 2026-05-29
Changed / last seen2026-05-29 / 2026-06-06

Related slices

PageWhat it containsOpen
Company jobsActive postings from Physicalintelligence.Open
Company breakdownsRole, location, ATS, and work model facets for this company.Open
ATS provider jobsActive postings observed through Ashby.Open
Provider filtered searchThe same provider as a filtered job collection.Open
City jobsActive postings in San Francisco.Open
Department jobsActive postings in Machine Learning.Open
Work model jobsActive On Site postings.Open
Lifecycle eventsOpen, update, close, and reopen events for this posting.Open
Original postingCanonical source or apply URL captured from the ATS.Open

Linked records

CompanyPhysicalintelligence
Source2c3ebdb4-5d1a-4bdc-9b66-cfbb7d577518
ATS providerAshby

Description

In this role you will help scale and optimize our training systems and core model code. You’ll own critical infrastructure for large-scale training, from managing GPU/TPU compute and job orchestration to building reusable and efficient JAX training pipelines. You’ll work closely with researchers and model engineers to translate ideas into experiments—and those experiments into production training runs. This is a hands-on, high-leverage role at the intersection of ML, software engineering, and scalable infrastructure. The Team The ML Infrastructure team supports and accelerates PI’s core modeling efforts by building the systems that make large-scale training reliable, reproducible, and fast. The team works closely with research, data, and platform engineers to ensure models can scale from prototype to production-grade training runs. In This Role You Will - Own training/inference infrastructure: Design, implement, and maintain systems for large-scale model training, including scheduling, job management, checkpointing, and metrics/logging. - Scale distributed training: Work with researchers to scale JAX-based training across TPU and GPU clusters with minimal friction. - Optimize performance: Profile and improve memory usage, device utilization, throughput, and distributed synchronization. - Enable rapid iteration: Build abstractions for launching, monitoring, debugging, and reproducing experiments. - Manage compute resources: Ensure efficient allocation and utilization of cloud-based GPU/TPU compute while controlling cost. - Partner with researchers: Translate research needs into infra capabilities and guide best practices for training at scale. - Contribute to core training code: Evolve JAX model and training code to support new architectures, modalities, and evaluation metrics. What We Hope You’ll Bring - Strong software engineering fundamentals and experience building ML training infrastructure or internal platforms. - Hands-on large-scale training experience in JAX (preferred), PyTorch. - Familiarity with distributed training, multi-host setups, data loaders, and evaluation pipelines. - Experience managing training workloads on cloud platforms (e.g., SLURM, Kubernetes, GCP TPU/GKE, AWS). - Ability to debug and optimize performance bottlenecks across the training stack. - Strong cross-functional communication and ownership mindset. Bonus Points If You Have - Deep ML systems background (e.g., training compilers, runtime optimization, custom kernels). - Experience operating close to hardware (GPU/TPU performance tuning). - Background in robotics, multimodal models, or large-scale foundation models. - Experience designing abstractions that balance researcher flexibility with system reliability. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Full job record

Job ID67f02f12506695a57f6d89e7d258f7f9a1b801be
Org IDcd906f47-5869-4ca1-a998-6ececc4415d9
Source ID2c3ebdb4-5d1a-4bdc-9b66-cfbb7d577518
Board ID2c3ebdb4-5d1a-4bdc-9b66-cfbb7d577518
Providerashby
Provider Job Key70ebf855-16df-4879-a6a7-ee0161174acc
TitleML Infra Engineer
Normalized Title
Statusactive
Activeyes
Location TextSan Francisco
DepartmentMachine Learning
TeamMachine Learning
Employment Typefull_time
Workplace Typeon_site
Remote Policy
CountryUnited States
RegionCA
CitySan Francisco
Salary Raw
Salary Min
Salary Max
Salary Currency
Salary Period
Source URLhttps://jobs.ashbyhq.com/physicalintelligence/70ebf855-16df-4879-a6a7-ee0161174acc
Apply URLhttps://jobs.ashbyhq.com/physicalintelligence/70ebf855-16df-4879-a6a7-ee0161174acc/application
First Seen At2026-05-29 05:24:33Z
Last Seen At2026-06-06 19:46:56Z
Last Checked At2026-06-06 19:46:56Z
Last Changed At2026-05-29 05:24:33Z
Inactive At
Source Posted At
Source Updated At
Raw Payload Uris3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=physicalintelligence/date=2026-06-06/2026-06-06T19-46-54-171Z-1d9f1a9fc8809c4fa1f05644a5470438736cab824fa3c2de3c66be2affe2875b.json
Event Fields
{
  "content_hash": "0fbbe4483d7a74d3a4fa4f5a1e625d4aff1a47cfeeb0fe1829909cc029473913",
  "source_hash": "38e03e070547b91a48702ff9761c896d2fe6b60fb1b1d53df9edad64dff7ea70",
  "last_changed_at": "2026-05-29T05:24:33.691Z",
  "active_status": "active"
}
Parsed Structured
{
  "language": "en",
  "location": {
    "raw": "San Francisco",
    "city": "San Francisco",
    "region": "CA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.75
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T19:46:56.760Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco",
      "city": "San Francisco",
      "region": "CA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.75
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": "on_site",
  "salary_currency": null
}
Extensions
{}
Native Structured
{
  "id": "70ebf855-16df-4879-a6a7-ee0161174acc",
  "team": "Machine Learning",
  "title": "ML Infra Engineer",
  "jobUrl": "https://jobs.ashbyhq.com/physicalintelligence/70ebf855-16df-4879-a6a7-ee0161174acc",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/physicalintelligence/70ebf855-16df-4879-a6a7-ee0161174acc/application",
  "isListed": true,
  "isRemote": false,
  "location": "San Francisco",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Machine Learning",
  "publishedAt": null,
  "workplaceType": "OnSite",
  "employmentType": "FullTime",
  "secondaryLocations": []
}
Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/67f02f12506695a57f6d89e7d258f7f9a1b801be?include=descriptionJSON
GET https://api.bluedoor.sh/job-postings/v1/orgs/cd906f47-5869-4ca1-a998-6ececc4415d9JSON
GET https://api.bluedoor.sh/job-postings/v1/sources/2c3ebdb4-5d1a-4bdc-9b66-cfbb7d577518JSON
GET https://api.bluedoor.sh/job-postings/v1/jobs/67f02f12506695a57f6d89e7d258f7f9a1b801be/eventsJSON