Home › Companies › Nuance Labs › Member of Technical Staff — Model Optimization and Inference (Experienced)

Member of Technical Staff — Model Optimization and Inference (Experienced)

Nuance Labs · Seattle, Washington · Active · $250,000–$350,000 / year · Greenhouse

Job facts

Field	Value
Company	Nuance Labs
Title	Member of Technical Staff — Model Optimization and Inference (Experienced)
Normalized title	-
Department / team	Research
Location	Seattle, WA, United States
Work model	-
Employment type	-
Salary	$250,000–$350,000 / year
Status	active
ATS provider	Greenhouse
Posted / first seen	2026-06-05 / 2026-06-06
Changed / last seen	2026-06-12 / 2026-06-18

Related slices

Page	What it contains	Open
Company jobs	Active postings from Nuance Labs.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Greenhouse.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Seattle.	Open
Department jobs	Active postings in Research.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Nuance Labs
Source	4d06c175-4ee5-4cda-ad2e-cc1de78b9519
ATS provider	Greenhouse

Description

About Nuance Labs Nuance Labs is building photorealistic, real-time AI avatars with emotional intelligence: a full-duplex audiovisual system that can listen, speak, react, interrupt, and respond like a real person. We're a Series A company ($60M raised) backed by Lightspeed, Accel, South Park Commons, NVentures, and Define Ventures, with PhDs from MIT, UW, Oxford, CMU, and Johns Hopkins, and industry experience from Apple, Meta, Amazon AGI, and Discord. The team is small, the work is real, and the problems are unsolved. How Nuance Differentiates Most conversational AI avatars today are hacks — a face slapped on a speech-to-speech pipeline, stuck in the uncanny valley: emotionless, mechanical, one-turn-at-a-time. Current systems take 2–5 seconds to respond; natural conversation requires sub-500ms. That's a 10x improvement, and it demands rethinking the entire stack. That rethinking starts with full-duplex: an AI that listens and speaks simultaneously, perceives emotion in real time, and responds with a face that actually reflects it. It's an extremely hard problem, and we're developing foundation models designed for it from the ground up. About the Role We can train a great model. The next problem is making it fast enough to actually use in a real-time conversation — and that gap is enormous. A model that responds in 3 seconds is a demo. A model that responds in under 500ms is a product. We’re looking for someone who specializes in taking trained models and squeezing every last millisecond out of them. You understand the full stack from model weights to serving infrastructure — quantization, KV cache optimization, kernel-level acceleration, batching strategies — and you know which lever to pull for which problem. You’ve worked with vLLM, SGLang, or similar frameworks at scale and have strong opinions about where they fall short. This posting is aimed at experienced engineers and researchers who’ve operated at a senior to senior-staff level at big tech, a leading AI lab, or a high-traffic inference team. Everyone at Nuance is MTS — we don’t run title ladders — but we’re hiring people who have already done this work at scale. Our stack is more complex than a standard LLM deployment: we’re serving a full-duplex multimodal system that must satisfy strict real-time latency constraints. There’s a lot of unsolved optimization work here, and we need someone who finds that genuinely exciting. What You’ll Do Own end-to-end inference optimization across our model stack — LLMs, audio models, and diffusion-based components Implement and tune KV cache strategies for long-context conversations, including eviction policies, compression, and memory-efficient attention Evaluate, deploy, and extend inference serving frameworks (vLLM, SGLang, TensorRT-LLM, etc.) for our specific workloads Profile and benchmark end-to-end latency and throughput; identify and systematically eliminate bottlenecks Build internal tooling that makes optimization work faster and more rigorous — profiling viewers, end-to-end inference test harnesses, and other infrastructure that helps the team move quickly Accelerate diffusion model inference — consistency models, step distillation, caching strategies, and custom kernel optimizations Apply and develop quantization techniques (INT8, INT4, GPTQ, AWQ, and beyond) to reduce memory footprint and increase throughput without meaningfully degrading quality Work closely with research and infrastructure to ensure new models ship with optimized serving from day one What We’re Looking For Significant hands-on experience with LLM inference optimization — you’ve shipped work on KV caching, memory layout, attention kernels, or batching strategies in a production or high-traffic research context Proven proficiency with inference serving frameworks — vLLM, SGLang, TensorRT-LLM, or similar — including going well beyond default configurations and adapting them to non-standard workloads Experience optimizing diffusion model inference (latency reduction, step distillation, caching, or kernel-level work) Strong Python and PyTorch skills; comfort reading and writing CUDA or Triton kernels is a significant plus A systematic approach to profiling and optimization — you measure first, then optimize Familiarity with speculative decoding or other inference-time acceleration techniques Bonus Points Hands-on experience with post-training quantization (GPTQ, AWQ, or similar) and a clear sense of quality/performance tradeoffs Familiarity with multimodal or streaming inference architectures Experience deploying real-time AI systems with hard latency SLAs Prior work at an AI lab, inference startup, or on a high-traffic model serving platform Contributions to open-source inference frameworks Compensation $250,000 – $350,000 base salary, plus meaningful equity. We think long-term ownership matters and structure equity accordingly. Logistics Location: In-person in Seattle, five days a week — we believe in the compounding value of working shoulder-to-shoulder. Visa sponsorship: We sponsor visas (O-1, H-1B, green card) from day one. AI-native tooling: Do your best work with the best tools, including unlimited tokens. Benefits Health: HSA plan with ~$2,000 in annual company contributions — roughly 2x what most big tech companies put in. Time off: 15 days of PTO plus public holidays, and we close the office for a full week at year-end. Food: Lunch, drinks, and snacks on us every workday — the small thing that quietly makes the day better. Commuter benefits: We help cover the cost of getting to the office. 401(k): In the works. Nuance Labs is an equal opportunity employer. We believe diverse teams build better AI.

Full job record

Job ID	0646882ee776a6508f479b2c54874993a50ad8ab
Org ID	b5cad4e8-d3e2-423b-934c-3898f78ddee7
Source ID	4d06c175-4ee5-4cda-ad2e-cc1de78b9519
Board ID	4d06c175-4ee5-4cda-ad2e-cc1de78b9519
Provider	greenhouse
Provider Job Key	4277592009
Title	Member of Technical Staff — Model Optimization and Inference (Experienced)
Normalized Title	—
Status	active
Active	yes
Location Text	Seattle, Washington
Department	Research
Team	—
Employment Type	—
Workplace Type	—
Remote Policy	—
Country	United States
Region	WA
City	Seattle
Salary Raw	Compensation $250,000 – $350,000 base salary, plus meaningful equity
Salary Min	250,000
Salary Max	350,000
Salary Currency	USD
Salary Period	year
Source URL	https://job-boards.greenhouse.io/nuancelabs/jobs/4277592009
Apply URL	https://job-boards.greenhouse.io/nuancelabs/jobs/4277592009
First Seen At	2026-06-06 07:33:06Z
Last Seen At	2026-06-18 07:33:50Z
Last Checked At	2026-06-18 07:33:50Z
Last Changed At	2026-06-12 07:33:13Z
Inactive At	—
Source Posted At	2026-06-05 21:33:35Z
Source Updated At	2026-06-11 17:53:24Z
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=greenhouse/board=nuancelabs/date=2026-06-18/2026-06-18T07-33-50-562Z-33dc1773ee82e4edc2bfd6012957eea6e6737004dc332995b8133acd8ffd3f03.json

Event Fields

{
  "content_hash": "5fe745b90f7794e5aebc91fb380888c622d4ad218c3eb888efb75b10746f2f71",
  "source_hash": "65d6f96b8e2313e06502c7fcfb54e62d0794b5f050a495ab098083089ca75379",
  "last_changed_at": "2026-06-12T07:33:13.378Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "Seattle, Washington",
    "city": "Seattle",
    "region": "WA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.85
  },
  "salary_max": 350000,
  "salary_min": 250000,
  "inferred_at": "2026-06-18T07:33:50.637Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "Seattle, Washington",
      "city": "Seattle",
      "region": "WA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.85
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": "year",
  "workplace_type": null,
  "salary_currency": "USD"
}

Extensions

{}

Native Structured

{
  "title": "Member of Technical Staff — Model Optimization and Inference (Experienced)",
  "offices": [
    {
      "id": 4030799009,
      "name": "Seattle",
      "location": null,
      "child_ids": [],
      "parent_id": null
    }
  ],
  "language": "en",
  "location": {
    "name": "Seattle, Washington"
  },
  "metadata": [],
  "updated_at": "2026-06-11T13:53:24-04:00",
  "departments": [
    {
      "id": 4031247009,
      "name": "Research",
      "child_ids": [],
      "parent_id": null
    }
  ],
  "company_name": "Nuance Labs",
  "requisition_id": 4162941009,
  "first_published": "2026-06-05T17:33:35-04:00",
  "application_deadline": null
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/0646882ee776a6508f479b2c54874993a50ad8ab?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/b5cad4e8-d3e2-423b-934c-3898f78ddee7JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/4d06c175-4ee5-4cda-ad2e-cc1de78b9519JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/0646882ee776a6508f479b2c54874993a50ad8ab/eventsJSON

Docs · Get an API key