Home › Companies › Liquid Ai › Member of Technical Staff - Distributed Training Engineer

Member of Technical Staff - Distributed Training Engineer

Liquid Ai · San Francisco · Hybrid · Active · Ashby

Job facts

Field	Value
Company	Liquid Ai
Title	Member of Technical Staff - Distributed Training Engineer
Normalized title	-
Department / team	Research & Engineering / Research & Engineering
Location	San Francisco, CA, United States
Work model	Hybrid / Hybrid
Employment type	Full Time
Salary	-
Status	active
ATS provider	Ashby
Posted / first seen	— / 2026-05-29
Changed / last seen	2026-05-29 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Liquid Ai.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Ashby.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in San Francisco.	Open
Department jobs	Active postings in Research & Engineering.	Open
Work model jobs	Active Hybrid postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Liquid Ai
Source	742a7b52-7fdb-4b2a-9162-251683c8ccc0
ATS provider	Ashby

Description

About Liquid AI Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there. The Opportunity Our Training Infrastructure team is building the distributed systems that power our next-generation Liquid Foundation Models. As we scale, we need to design, implement, and optimize the infrastructure that enables large-scale training. This is a high-ownership training systems role focused on runtime/performance/reliability (not a general platform/SRE role). You’ll work on a small team with fast feedback loops, building critical systems from the ground up rather than inheriting mature infrastructure. While San Francisco and Boston are preferred, we are open to other locations. What We're Looking For We need someone who: Loves distributed systems complexity: Our team builds systems that keeps long training runs stable, debugs training failures across GPU clusters, and improves performance. Wants to build: We need builders who find satisfaction in robust, fast, reliable infrastructure. Thrives in ambiguity: Our systems support model architectures that are still evolving. We make decisions with incomplete information and iterate quickly. Aligns with team priorities and delivers: Our best engineers align with team priorities while pushing back with data when they see problems. The Work Design and build core systems that make large training runs fast and reliable Build scalable distributed training infrastructure for GPU clusters Implement and tune parallelism/sharding strategies for evolving architectures Optimize distributed efficiency (topology-aware collectives, comm/compute overlap, straggler mitigation) Build data loading systems that eliminate I/O bottlenecks for multimodal datasets Develop checkpointing mechanisms balancing memory constraints with recovery needs Create monitoring, profiling, and debugging tools for training stability and performance Desired Experience Must-have: Hands-on experience building distributed training infrastructure (PyTorch Distributed DDP/FSDP, DeepSpeed ZeRO, Megatron-LM TP/PP) Experience diagnosing performance bottlenecks and failure modes (profiling, NCCL/collectives issues, hangs, OOMs, stragglers) Understanding of hardware accelerators and networking topologies Experience optimizing data pipelines for ML workloads Nice-to-have: MoE (Mixture of Experts) training experience Large-scale distributed training (100+ GPUs) Open-source contributions to training infrastructure projects What Success Looks Like (Year One) Training throughput has increased Overall training efficiency/cost has improved Training stability has improved (fewer failures, faster recovery) Data loading bottlenecks are eliminated for multimodal workloads What We Offer Greenfield challenges: Build systems from scratch for novel architectures. High ownership from day one. Compensation: Competitive base salary with equity in a unicorn-stage company Health: We pay 100% of medical, dental, and vision premiums for employees and dependents Financial: 401(k) matching up to 4% of base pay Time Off: Unlimited PTO plus company-wide Refill Days throughout the year

Full job record

Job ID	58b042f2609df48a349a6680f830c16f1f1e78c2
Org ID	8e1f31f3-2052-48e9-ae14-b36a9ec2a6dd
Source ID	742a7b52-7fdb-4b2a-9162-251683c8ccc0
Board ID	742a7b52-7fdb-4b2a-9162-251683c8ccc0
Provider	ashby
Provider Job Key	a25b97f4-02ee-4453-a2e1-f8d5cfe2c4b4
Title	Member of Technical Staff - Distributed Training Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	San Francisco
Department	Research & Engineering
Team	Research & Engineering
Employment Type	full_time
Workplace Type	hybrid
Remote Policy	hybrid
Country	United States
Region	CA
City	San Francisco
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://jobs.ashbyhq.com/liquid-ai/a25b97f4-02ee-4453-a2e1-f8d5cfe2c4b4
Apply URL	https://jobs.ashbyhq.com/liquid-ai/a25b97f4-02ee-4453-a2e1-f8d5cfe2c4b4/application
First Seen At	2026-05-29 06:16:09Z
Last Seen At	2026-06-06 09:15:31Z
Last Checked At	2026-06-06 09:15:31Z
Last Changed At	2026-05-29 06:16:09Z
Inactive At	—
Source Posted At	—
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=liquid-ai/date=2026-06-06/2026-06-06T09-15-21-849Z-b5fc798149de9351214373470cfd157c647e407a6863d96db62ef3ef57fc83e6.json

Event Fields

{
  "content_hash": "dd85f5349e48f962c4219860d9e65b17170999f1ac03e025ba69a3325f1790e2",
  "source_hash": "901e46b7161d8187e9ef71a064123a23b5a6cde27300a8d51a5a414874c807a4",
  "last_changed_at": "2026-05-29T06:16:09.429Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "San Francisco",
    "city": "San Francisco",
    "region": "CA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.75
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T09:15:31.116Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco",
      "city": "San Francisco",
      "region": "CA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.75
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": "hybrid",
  "salary_period": null,
  "workplace_type": "hybrid",
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "id": "a25b97f4-02ee-4453-a2e1-f8d5cfe2c4b4",
  "team": "Research & Engineering",
  "title": "Member of Technical Staff - Distributed Training Engineer",
  "jobUrl": "https://jobs.ashbyhq.com/liquid-ai/a25b97f4-02ee-4453-a2e1-f8d5cfe2c4b4",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/liquid-ai/a25b97f4-02ee-4453-a2e1-f8d5cfe2c4b4/application",
  "isListed": true,
  "isRemote": false,
  "location": "San Francisco",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Research & Engineering",
  "publishedAt": null,
  "workplaceType": "Hybrid",
  "employmentType": "FullTime",
  "secondaryLocations": [
    {
      "location": "Boston"
    },
    {
      "location": "Remote"
    }
  ]
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/58b042f2609df48a349a6680f830c16f1f1e78c2?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/8e1f31f3-2052-48e9-ae14-b36a9ec2a6ddJSON

GET https://api.bluedoor.sh/job-postings/v1/sources/742a7b52-7fdb-4b2a-9162-251683c8ccc0JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/58b042f2609df48a349a6680f830c16f1f1e78c2/eventsJSON

Docs · Get an API key