Home › Companies › Bland AI › Machine Learning Researcher, Audio

Machine Learning Researcher, Audio

Bland AI · San Francisco · Remote · Active · Ashby

Job facts

Field	Value
Company	Bland AI
Title	Machine Learning Researcher, Audio
Normalized title	-
Department / team	Engineering / Engineering
Location	San Francisco, CA, United States
Work model	Remote / Remote
Employment type	Full Time
Salary	-
Status	active
ATS provider	Ashby
Posted / first seen	— / 2026-05-29
Changed / last seen	2026-05-29 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Bland AI.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Ashby.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in San Francisco.	Open
Department jobs	Active postings in Engineering.	Open
Work model jobs	Active Remote postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Bland AI
Source	48595e7f-5d99-4cc1-8c80-c29f78dccb00
ATS provider	Ashby

Description

Machine Learning Researcher, Audio Location: San Francisco, CA or Remote (US) About Bland At Bland.com, our mission is to empower enterprises to build AI phone agents at scale. Based in San Francisco, we are a fast-growing team reimagining how customers interact with businesses through voice. We have raised $65 million from leading Silicon Valley investors, including Emergence Capital, Scale Venture Partners, Y Combinator, and founders of Twilio, Affirm, and ElevenLabs. Voice is quickly becoming the primary interface between businesses and their customers. We are building the models and infrastructure that make those interactions feel natural, reliable, and genuinely human. The Role: Machine Learning Researcher, Audio As a Machine Learning Researcher at Bland, you'll be working on foundational research and development across the core components of our voice stack: speech-to-text, large language models, neural audio codecs, and text-to-speech. Your work will define how our agents understand, reason, and speak in real time at enterprise scale. This is not a narrow research role. You will take ideas from theory to large-scale training to production inference systems serving millions of calls per day. You will design new modeling approaches, validate them with rigorous experimentation, and collaborate with engineering teams to deploy them into real customer environments. What You Will Do Build and Scale Next-Generation TTS Systems Design and train large scale text-to-speech models capable of expressive, controllable, human-sounding output. Develop neural audio codec-based TTS architectures for efficient, high-fidelity generation. Improve prosody modeling, question inflection, emotional expression, and multi-speaker robustness. Optimize for real-time, low-latency inference in production. Advance Speech-to-Text Modeling Build and fine-tune large scale ASR systems robust to accents, noise, telephony artifacts, and code switching. Leverage self-supervised pretraining and large-scale weak supervision. Improve transcription accuracy for real-world enterprise scenarios, including structured extraction and conversational nuance. Pioneer Neural Audio Codecs Research and implement neural audio codecs that achieve extreme compression with minimal perceptual loss. Explore discrete and continuous latent representations for scalable speech modeling. Design codec architectures that enable downstream generative modeling and controllable synthesis. Develop Scalable Training Pipelines Curate and process massive audio datasets across languages, speakers, and environments. Design staged training curricula and data filtering strategies. Scale training across distributed GPU clusters focusing on cost, throughput, and reliability. Run Rigorous Experiments Design ablation studies that isolate the impact of architectural changes. Measure improvements using both objective metrics and perceptual evaluations. Validate ideas quickly through focused experiments that confirm or eliminate hypotheses. What Makes You a Great Fit Deep Research Foundations Experience with self-supervised learning, multimodal modeling, or generative modeling. Ability to derive new formulations and implement them efficiently. Expertise in Voice Modeling Hands-on experience building or scaling TTS, STT, or neural audio codec systems. Familiarity with large scale speech datasets and real-world audio variability. Strong intuition for audio quality, prosody, and conversational dynamics. Systems and Hardware Awareness Experience training and serving large models on modern accelerators. Knowledge of inference optimization techniques, including quantization, kernel optimization, and memory efficiency. Understanding of real-time constraints in telephony or streaming environments. Experimental Rigor Track record of designing controlled experiments and meaningful ablations. Comfortable working with both offline benchmarks and live production metrics. Ability to move quickly from hypothesis to validation. Builder Mentality Comfortable in fast-moving startup environments. Strong ownership mindset from research through deployment. Excited by ambiguous, unsolved problems. How You Show Up You treat unsolved problems as opportunities to invent new paradigms. You identify the single experiment that can validate an idea in days, not months. You measure everything and let data drive decisions. You are obsessed with making voice agents sound truly human. You use AI tools aggressively to amplify your own impact and accelerate research cycles. Bonus Points Experience with large scale distributed training. Research publications or open source contributions in speech or language AI. Background in real-time speech systems or telephony. PhD in ML, AI, or a related field, or equivalent research impact. Benefits and Compensation Healthcare, dental, vision, all the good stuff Meaningful equity in a fast-growing company Every tool you need to succeed Beautiful office in Jackson Square, SF with rooftop views Competitive salary: $160,000 to $250,000 If you are energized by building and scaling TTS models, pioneering neural audio codecs, and pushing the boundaries of speech-to-text systems, we would love to hear from you.

Full job record

Job ID	39827d8dff4ae65eb360c78b202b57b4bfd2f6df
Org ID	569fe03f-a7fa-4641-99bf-524c4843e3c4
Source ID	48595e7f-5d99-4cc1-8c80-c29f78dccb00
Board ID	48595e7f-5d99-4cc1-8c80-c29f78dccb00
Provider	ashby
Provider Job Key	2e815d0d-8e7a-43cc-8853-c1b029aeb499
Title	Machine Learning Researcher, Audio
Normalized Title	—
Status	active
Active	yes
Location Text	San Francisco
Department	Engineering
Team	Engineering
Employment Type	full_time
Workplace Type	remote
Remote Policy	remote
Country	United States
Region	CA
City	San Francisco
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://jobs.ashbyhq.com/bland/2e815d0d-8e7a-43cc-8853-c1b029aeb499
Apply URL	https://jobs.ashbyhq.com/bland/2e815d0d-8e7a-43cc-8853-c1b029aeb499/application
First Seen At	2026-05-29 05:36:16Z
Last Seen At	2026-06-06 20:07:13Z
Last Checked At	2026-06-06 20:07:13Z
Last Changed At	2026-05-29 05:36:16Z
Inactive At	—
Source Posted At	—
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=bland/date=2026-06-06/2026-06-06T20-07-11-791Z-271fe11740cf14105f366a50d0633967bba4e2b6bc089cb57d06b0e02ec87180.json

Event Fields

{
  "content_hash": "4fbe7b9ef881e7250a8d4ab838e605d61bcedc447d711b92629aa534e509323a",
  "source_hash": "e4d62ec4246c8d086b7a7d50593ab782a8dc1ea5d24f3a48d87c5afc7636c637",
  "last_changed_at": "2026-05-29T05:36:16.147Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "San Francisco",
    "city": "San Francisco",
    "region": "CA",
    "country": "United States",
    "is_remote": true,
    "confidence": 0.75
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T20:07:13.191Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco",
      "city": "San Francisco",
      "region": "CA",
      "country": "United States",
      "is_remote": true,
      "confidence": 0.75
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": "remote",
  "salary_period": null,
  "workplace_type": "remote",
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "id": "2e815d0d-8e7a-43cc-8853-c1b029aeb499",
  "team": "Engineering",
  "title": "Machine Learning Researcher, Audio",
  "jobUrl": "https://jobs.ashbyhq.com/bland/2e815d0d-8e7a-43cc-8853-c1b029aeb499",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/bland/2e815d0d-8e7a-43cc-8853-c1b029aeb499/application",
  "isListed": true,
  "isRemote": false,
  "location": "San Francisco",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Engineering",
  "publishedAt": null,
  "workplaceType": null,
  "employmentType": "FullTime",
  "secondaryLocations": []
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/39827d8dff4ae65eb360c78b202b57b4bfd2f6df?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/569fe03f-a7fa-4641-99bf-524c4843e3c4JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/48595e7f-5d99-4cc1-8c80-c29f78dccb00JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/39827d8dff4ae65eb360c78b202b57b4bfd2f6df/eventsJSON

Docs · Get an API key