Home › Companies › Sanas.AI Inc › Member of Technical Staff (Data Acquisition)
Member of Technical Staff (Data Acquisition)
Sanas.AI Inc · Palo Alto, CA, United States · On Site · Deleted · Rippling ATS
Job facts
| Field | Value |
|---|---|
| Company | Sanas.AI Inc |
| Title | Member of Technical Staff (Data Acquisition) |
| Normalized title | - |
| Department / team | Science |
| Location | Palo Alto, CA, United States |
| Work model | On Site |
| Employment type | Full Time |
| Salary | - |
| Status | deleted |
| ATS provider | Rippling ATS |
| Posted / first seen | 2026-04-06 / 2026-05-29 |
| Changed / last seen | 2026-06-06 / 2026-06-03 |
Related slices
| Page | What it contains | Open |
|---|---|---|
| Company jobs | Active postings from Sanas.AI Inc. | Open |
| Company breakdowns | Role, location, ATS, and work model facets for this company. | Open |
| ATS provider jobs | Active postings observed through Rippling ATS. | Open |
| Provider filtered search | The same provider as a filtered job collection. | Open |
| City jobs | Active postings in Palo Alto. | Open |
| Department jobs | Active postings in Science. | Open |
| Work model jobs | Active On Site postings. | Open |
| Lifecycle events | Open, update, close, and reopen events for this posting. | Open |
| Original posting | Canonical source or apply URL captured from the ATS. | Open |
Linked records
| Company | Sanas.AI Inc |
| Source | 1fc1335f-581e-4138-ae2e-6e3d6c790876 |
| ATS provider | Rippling ATS |
Description
company
Sanas is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time speech AI platform capable of accent translation, noise cancellation, speech enhancement, cross-language communication, and more.
Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language.
Sanas is currently one of the fastest growing startups in Silicon Valley, growing from $16M to $50M ARR in 2025. The company's core business is profitable and is on track to end 2026 with >$120M ARR. Our team combines deep expertise in model innovation and systems engineering with a design-minded product engineering culture to build and ship cutting-edge AI models and experiences — entirely in-house.
Sanas is a 180-strong team, established in 2020. In this short span, we've successfully secured over $100 million in funding. Our innovation has been supported by the industry's leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you're not just adopting a product; you're investing in the future of communication.
If you’re looking to have a significant role in roadmapping and driving technical directions, if you’re looking to deploy challenging and big ideas without much overhead or slowness, if you're looking to leave your mark on an ambitious, generational mission to change how the worlds thinks about speech + AI, then Sanas is a well-suited place for you.
role
About Sanas Sanas is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time speech AI platform capable of accent translation, noise cancellation, speech enhancement, cross-language communication, and more.
Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language.
Sanas is currently one of the fastest growing startups in Silicon Valley, growing from $16M to $50M ARR in 2025. The company's core business is profitable and is on track to end 2026 with >$120M ARR. Our team combines deep expertise in model innovation and systems engineering with a design-minded product engineering culture to build and ship cutting-edge AI models and experiences — entirely in-house.
Sanas is a 180-strong team, established in 2020. In this short span, we've successfully secured over $100 million in funding. Our innovation has been supported by the industry's leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you're not just adopting a product; you're investing in the future of communication.
If you’re looking to have a significant role in roadmapping and driving technical directions, if you’re looking to deploy challenging and big ideas without much overhead or slowness, if you're looking to leave your mark on an ambitious, generational mission to change how the worlds thinks about speech + AI, then Sanas is a well-suited place for you.
About the Role Your mission is to build and operate the ingestion systems that turn the open web and large-scale audio sources into reliable, well-structured corpora for training Sanas's frontier speech models. You'll own the machinery that acquires, extracts, filters, versions, and delivers audio data to our training pipelines — and you'll work directly with our research scientists to close the loop between what we collect and how it moves model quality.
Job Description Data acquisition & ingestion
Own and lead engineering projects across the full data acquisition stack — web crawling, audio ingestion, source discovery, and dataset delivery to training pipelines. Build and operate large-scale distributed crawling infrastructure capable of continuously discovering and ingesting audio at scale across languages, accents, domains, and recording environments. Develop specialized crawlers for high-priority audio sources with source-specific extraction and normalization logic. Run experiments to evaluate crawling strategies, extraction methods, and ingestion tradeoffs; analyze results to identify gaps, redundancy, and coverage improvements across speaker demographics and language pairs. Build ingestion pipelines that scale reliably across large data campaigns, with automated audio quality filtering — SNR estimation, clipping detection, codec artifact identification — as a first-class pipeline stage. Systems & infrastructure
Design and deploy highly scalable distributed systems capable of handling petabytes of audio data — from raw acquisition through quality filtering, deduplication, segmentation, and versioned dataset generation. Architect and implement indexing and search capabilities over large audio corpora — enabling fast lookup by language, speaker, acoustic condition, duration, and quality tier. Build and maintain backend services for data storage, including key-value databases, metadata synchronization, and manifest management across dataset versions. Deploy and operate acquisition infrastructure in a Kubernetes / Infrastructure-as-Code environment; perform routine system health checks and respond to production issues quickly. Collaborate closely with data processing, architecture, and ML platform teams to ensure smooth data flow from acquisition through to training-ready outputs. Compliance & data governance
Work closely with legal to handle compliance, data privacy, and licensing matters across all acquisition sources — maintaining a clear audit trail of provenance, permitted use, and commercial training rights for every dataset. Enforce speaker consent documentation, GDPR requirements, robots.txt and ToS adherence, and audio retention policies across all ingestion pipelines. Manage relationships with third-party data vendors — writing precise acquisition briefs, evaluating quality on delivery, and ensuring sourced data meets Sanas's licensing and quality standards. Qualifications 4+ years of experience in data engineering, ML data infrastructure, or backend systems engineering — with direct experience building large-scale data ingestion or crawling systems. Strong Python and systems engineering skills — you build robust, maintainable infrastructure, not just one-off scripts. Hands-on experience with distributed systems design: you've built systems that handle failure gracefully, scale horizontally, and recover cleanly. Experience with web crawling infrastructure at scale including handling rate limiting, deduplication, and content extraction. Proficiency with cloud platforms (AWS or GCP), object storage (S3/GCS), and container orchestration (Kubernetes). Comfort working with audio processing tooling — ffmpeg, librosa, torchaudio, sox — and experience handling large volumes of audio files. Strong data quality instincts: you instrument pipelines, surface issues proactively, and treat data correctness with the same rigor as software correctness. Bonus Experience building speech or audio datasets for ASR, TTS, speech enhancement, or speaker verification model training. Familiarity with major open speech corpora — Common Voice, LibriSpeech, VoxPopuli, AISHELL — and their sourcing and quality characteristics. Experience with data versioning tools. Background in multilingual or low-resource language data collection. Experience with annotation and labeling platforms. Familiarity with speaker diarization, language identification, or automated audio quality estimation models used for data filtering at scale.
Full job record
| Job ID | 149e01a361cf423bf3a92184109f3f00d34f69a4 |
| Org ID | 83ad35d8-903f-4812-a8a1-7e0502248692 |
| Source ID | 1fc1335f-581e-4138-ae2e-6e3d6c790876 |
| Board ID | 1fc1335f-581e-4138-ae2e-6e3d6c790876 |
| Provider | rippling |
| Provider Job Key | ad6ba237-eb74-4755-bf0d-027edbc3222c |
| Title | Member of Technical Staff (Data Acquisition) |
| Normalized Title | — |
| Status | deleted |
| Active | no |
| Location Text | Palo Alto, CA, United States |
| Department | Science |
| Team | — |
| Employment Type | full_time |
| Workplace Type | on_site |
| Remote Policy | — |
| Country | United States |
| Region | CA |
| City | Palo Alto |
| Salary Raw | — |
| Salary Min | — |
| Salary Max | — |
| Salary Currency | — |
| Salary Period | — |
| Source URL | https://ats.rippling.com/sanas/jobs/ad6ba237-eb74-4755-bf0d-027edbc3222c |
| Apply URL | https://ats.rippling.com/sanas/jobs/ad6ba237-eb74-4755-bf0d-027edbc3222c |
| First Seen At | 2026-05-29 07:10:25Z |
| Last Seen At | 2026-06-03 12:13:29Z |
| Last Checked At | 2026-06-06 08:42:22Z |
| Last Changed At | 2026-06-06 08:42:22Z |
| Inactive At | 2026-06-06 08:42:22Z |
| Source Posted At | 2026-04-06 19:23:06Z |
| Source Updated At | — |
| Raw Payload Uri | s3://bluework-jobs-prod-raw-590183727216/raw/provider=rippling/board=sanas/date=2026-06-03/2026-06-03T12-13-28-930Z-08e539e86adb9488f9d27b6e63e5dbd4b4861b3ceef6a0aae802617719a2970f.json |
Event Fields
{
"content_hash": "80a471111351fd9af31a0803249f7e3e928429642edb97d0fbb4e7bebe8f2cfc",
"source_hash": "637a50c16d1eeba8d2ea3b5d36a89bdaa62f51dc998dfa6cc3a95c2df40194e3",
"last_changed_at": "2026-06-06T08:42:22.197Z",
"active_status": "deleted"
}Parsed Structured
{
"language": "en-us",
"location": {
"raw": "Palo Alto, CA, United States",
"city": "Palo Alto",
"region": "CA",
"country": "United States",
"is_remote": false,
"confidence": 0.98,
"workplace_type": "on_site"
},
"salary_max": null,
"salary_min": null,
"inferred_at": "2026-06-03T12:13:29.525Z",
"launch_scope": {
"reason": "english_us_canada",
"included": true,
"language": "en-us",
"location": {
"raw": "Palo Alto, CA, United States",
"city": "Palo Alto",
"region": "CA",
"country": "United States",
"is_remote": false,
"confidence": 0.98,
"workplace_type": "on_site"
},
"countries": [
"United States"
]
},
"remote_policy": null,
"salary_period": null,
"workplace_type": "on_site",
"salary_currency": null
}Extensions
{}Native Structured
{
"list_job": {
"id": "ad6ba237-eb74-4755-bf0d-027edbc3222c",
"url": "https://ats.rippling.com/sanas/jobs/ad6ba237-eb74-4755-bf0d-027edbc3222c",
"name": "Member of Technical Staff (Data Acquisition)",
"language": "en-US",
"locations": [
{
"city": "Palo Alto",
"name": "Palo Alto, CA",
"state": "California",
"country": "United States",
"stateCode": "CA",
"countryCode": "US",
"workplaceType": "ON_SITE"
}
],
"department": {
"name": "Science"
}
},
"detail_job": {
"url": "https://ats.rippling.com/sanas/jobs/ad6ba237-eb74-4755-bf0d-027edbc3222c",
"name": "Member of Technical Staff (Data Acquisition)",
"uuid": "ad6ba237-eb74-4755-bf0d-027edbc3222c",
"board": {
"logo": {
"url": "https://secured-assets.ripplingcdn.com/us1/ats/6862bfbc77c6a4c5f95ae521/ats/eac70887e52a4dcfa07336c69c248ce0?Expires=1780575209&Signature=WcAJjKq8w9zZoa6reRZCkbauobhsCX1CDhcZLP-KceptzvkvhKO0O9jC7CUkm9j-TzL74ikXQP5wcQED5udVaL4B-66xxBKyU-wEiPl-rxcIyuUUriKinneFkLXFcsRHXzZ0eu7hrAcMWK~M9Yy4Dqx4zjbGgtS98NmN1gpsH~mV0AUsnBekzozQez1PQZ8-iIZb0l~UoiRut7CvLi8b2Zts6EdoIVoArYS0uxeQo~RaU5d02R0dIFm8QeL3jiIfIFY5SrdEFdVcCNIAqAoG-wO5QnIVzxk7YBjJOEEPP70pOQwyG4IasUImBBR6eDKfn7reRXbgTJJylzGgwbyc4Q__&Key-Pair-Id=K2SM3GXN9F9XGM",
"name": "Sanas-Logo-Full-RGB-Black (1).png",
"type": "image/png"
},
"slug": "sanas",
"title": "Sanas",
"banner": {
"url": null,
"name": "",
"type": ""
},
"boardURL": "https://ats.rippling.com/sanas/jobs",
"fontType": null,
"subtitle": null,
"boardType": "RIPPLING",
"linkColor": null,
"buttonColor": null,
"legalNotice": null,
"buttonTextColor": null,
"noOpeningsMessage": null,
"groupJobsByLocation": false,
"showBoardLogoOnJobPost": true,
"showCompanyInfoUnderJobPost": false
},
"createdOn": "2026-04-06T12:23:06.525000-07:00",
"department": {
"name": "Science",
"base_department": "Science",
"department_tree": [
"Science"
]
},
"companyName": "Sanas.AI Inc",
"description": {
"role": "<meta><h2 style=\"font-family:"Basel Grotesk",Arial,sans-serif;line-height:1.6;font-size:15pt;font-weight:600;letter-spacing:0.5px;margin-top:18px;margin-bottom:4px;padding-left:0px;\"><b><strong style=\"font-size:15pt;white-space:pre-wrap;\">About Sanas</strong></b></h2><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><span style=\"white-space:pre-wrap;\">Sanas is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time speech AI platform capable of accent translation, noise cancellation, speech enhancement, cross-language communication, and more.</span></p><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><span style=\"white-space:pre-wrap;\">Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language.</span></p><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><span style=\"white-space:pre-wrap;\">Sanas is currently one of the fastest growing startups in Silicon Valley, growing from $16M to $50M ARR in 2025. The company's core business is profitable and is on track to end 2026 with >$120M ARR. Our team combines deep expertise in model innovation and systems engineering with a design-minded product engineering culture to build and ship cutting-edge AI models and experiences — entirely in-house.</span></p><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><span style=\"white-space:pre-wrap;\">Sanas is a 180-strong team, established in 2020. In this short span, we've successfully secured over $100 million in funding. Our innovation has been supported by the industry's leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you're not just adopting a product; you're investing in the future of communication.</span></p><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><span style=\"white-space:pre-wrap;\">If you’re looking to have a significant role in roadmapping and driving technical directions, if you’re looking to deploy challenging and big ideas without much overhead or slowness, if you're looking to leave your mark on an ambitious, generational mission to change how the worlds thinks about speech + AI, then Sanas is a well-suited place for you.</span></p><h2 style=\"font-family:"Basel Grotesk",Arial,sans-serif;line-height:1.6;font-size:15pt;font-weight:600;letter-spacing:0.5px;margin-top:18px;margin-bottom:4px;padding-left:0px;\"><b><strong style=\"font-size:15pt;white-space:pre-wrap;\">About the Role</strong></b></h2><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><span style=\"white-space:pre-wrap;\">Your mission is to build and operate the ingestion systems that turn the open web and large-scale audio sources into reliable, well-structured corpora for training Sanas's frontier speech models. You'll own the machinery that acquires, extracts, filters, versions, and delivers audio data to our training pipelines — and you'll work directly with our research scientists to close the loop between what we collect and how it moves model quality.</span></p><h2 style=\"font-family:"Basel Grotesk",Arial,sans-serif;line-height:1.6;font-size:15pt;font-weight:600;letter-spacing:0.5px;margin-top:18px;margin-bottom:4px;padding-left:0px;\"><b><strong style=\"font-size:15pt;white-space:pre-wrap;\">Job Description</strong></b></h2><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><b><strong style=\"white-space:pre-wrap;\">Data acquisition & ingestion</strong></b></p><ul data-pattern=\"discCircleSquare\" data-depth=\"1\" style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;margin:8px 0px;line-height:1.6;padding:0px 0px 0px 32px;list-style-type:disc;\"><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Own and lead engineering projects across the full data acquisition stack — web crawling, audio ingestion, source discovery, and dataset delivery to training pipelines.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Build and operate large-scale distributed crawling infrastructure capable of continuously discovering and ingesting audio at scale across languages, accents, domains, and recording environments.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Develop specialized crawlers for high-priority audio sources with source-specific extraction and normalization logic.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Run experiments to evaluate crawling strategies, extraction methods, and ingestion tradeoffs; analyze results to identify gaps, redundancy, and coverage improvements across speaker demographics and language pairs.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Build ingestion pipelines that scale reliably across large data campaigns, with automated audio quality filtering — SNR estimation, clipping detection, codec artifact identification — as a first-class pipeline stage.</span></li></ul><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><b><strong style=\"white-space:pre-wrap;\">Systems & infrastructure</strong></b></p><ul data-pattern=\"discCircleSquare\" data-depth=\"1\" style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;margin:8px 0px;line-height:1.6;padding:0px 0px 0px 32px;list-style-type:disc;\"><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Design and deploy highly scalable distributed systems capable of handling petabytes of audio data — from raw acquisition through quality filtering, deduplication, segmentation, and versioned dataset generation.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Architect and implement indexing and search capabilities over large audio corpora — enabling fast lookup by language, speaker, acoustic condition, duration, and quality tier.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Build and maintain backend services for data storage, including key-value databases, metadata synchronization, and manifest management across dataset versions.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Deploy and operate acquisition infrastructure in a Kubernetes / Infrastructure-as-Code environment; perform routine system health checks and respond to production issues quickly.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Collaborate closely with data processing, architecture, and ML platform teams to ensure smooth data flow from acquisition through to training-ready outputs.</span></li></ul><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;\"><b><strong style=\"white-space:pre-wrap;\">Compliance & data governance</strong></b></p><ul data-pattern=\"discCircleSquare\" data-depth=\"1\" style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;margin:8px 0px;line-height:1.6;padding:0px 0px 0px 32px;list-style-type:disc;\"><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Work closely with legal to handle compliance, data privacy, and licensing matters across all acquisition sources — maintaining a clear audit trail of provenance, permitted use, and commercial training rights for every dataset.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Enforce speaker consent documentation, GDPR requirements, robots.txt and ToS adherence, and audio retention policies across all ingestion pipelines.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Manage relationships with third-party data vendors — writing precise acquisition briefs, evaluating quality on delivery, and ensuring sourced data meets Sanas's licensing and quality standards.</span></li></ul><h2 style=\"font-family:"Basel Grotesk",Arial,sans-serif;line-height:1.6;font-size:15pt;font-weight:600;letter-spacing:0.5px;margin-top:18px;margin-bottom:4px;padding-left:0px;\"><b><strong style=\"font-size:15pt;white-space:pre-wrap;\">Qualifications</strong></b></h2><ul data-pattern=\"discCircleSquare\" data-depth=\"1\" style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;margin:8px 0px;line-height:1.6;padding:0px 0px 0px 32px;list-style-type:disc;\"><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">4+ years of experience in data engineering, ML data infrastructure, or backend systems engineering — with direct experience building large-scale data ingestion or crawling systems.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Strong Python and systems engineering skills — you build robust, maintainable infrastructure, not just one-off scripts.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Hands-on experience with distributed systems design: you've built systems that handle failure gracefully, scale horizontally, and recover cleanly.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Experience with web crawling infrastructure at scale including handling rate limiting, deduplication, and content extraction.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Proficiency with cloud platforms (AWS or GCP), object storage (S3/GCS), and container orchestration (Kubernetes).</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Comfort working with audio processing tooling — ffmpeg, librosa, torchaudio, sox — and experience handling large volumes of audio files.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Strong data quality instincts: you instrument pipelines, surface issues proactively, and treat data correctness with the same rigor as software correctness.</span></li></ul><h2 style=\"font-family:"Basel Grotesk",Arial,sans-serif;line-height:1.6;font-size:15pt;font-weight:600;letter-spacing:0.5px;margin-top:18px;margin-bottom:4px;padding-left:0px;\"><b><strong style=\"font-size:15pt;white-space:pre-wrap;\">Bonus</strong></b></h2><ul data-pattern=\"discCircleSquare\" data-depth=\"1\" style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;margin:8px 0px;line-height:1.6;padding:0px 0px 0px 32px;list-style-type:disc;\"><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Experience building speech or audio datasets for ASR, TTS, speech enhancement, or speaker verification model training.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Familiarity with major open speech corpora — Common Voice, LibriSpeech, VoxPopuli, AISHELL — and their sourcing and quality characteristics.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Experience with data versioning tools.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Background in multilingual or low-resource language data collection.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Experience with annotation and labeling platforms.</span></li><li style=\"font-size:11pt;margin:3px 0px;letter-spacing:0.25px;line-height:1.6;\"><span style=\"white-space:pre-wrap;\">Familiarity with speaker diarization, language identification, or automated audio quality estimation models used for data filtering at scale.</span></li></ul>",
"company": "<meta><p style=\"font-family:"Basel Grotesk",Arial,sans-serif;font-size:11pt;font-weight:400;line-height:1.6;letter-spacing:0.25px;margin:4px 0px;padding:0px;text-align:start;\"><span style=\"white-space:pre-wrap;\">Sanas is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience, Sanas has developed the world's first real-time speech AI platform capable of accent translation, noise cancellation, speech enhancement, cross-language communication, and more.</span><br><br><span style=\"white-space:pre-wrap;\">Sanas makes conversations clearer, more inclusive, and more effective, removing barriers that prevent people from being understood, regardless of accent, background noise, or native language.</span><br><br><span style=\"white-space:pre-wrap;\">Sanas is currently one of the fastest growing startups in Silicon Valley, growing from $16M to $50M ARR in 2025. The company's core business is profitable and is on track to end 2026 with >$120M ARR. Our team combines deep expertise in model innovation and systems engineering with a design-minded product engineering culture to build and ship cutting-edge AI models and experiences — entirely in-house.</span><br><br><span style=\"white-space:pre-wrap;\">Sanas is a 180-strong team, established in 2020. In this short span, we've successfully secured over $100 million in funding. Our innovation has been supported by the industry's leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you're not just adopting a product; you're investing in the future of communication.</span><br><br><span style=\"white-space:pre-wrap;\">If you’re looking to have a significant role in roadmapping and driving technical directions, if you’re looking to deploy challenging and big ideas without much overhead or slowness, if you're looking to leave your mark on an ambitious, generational mission to change how the worlds thinks about speech + AI, then Sanas is a well-suited place for you.</span></p>"
},
"workLocations": [
"Palo Alto, CA"
],
"employmentType": {
"id": "Salaried, full-time",
"label": "SALARIED_FT"
},
"payRangeDetails": [],
"activeJobApplication": {
"basicQuestions": [
{
"oid": "first_name",
"title": "First name",
"required": true,
"fieldType": "SHORT_ANSWER"
},
{
"oid": "last_name",
"title": "Last name",
"required": true,
"fieldType": "SHORT_ANSWER"
},
{
"oid": "email",
"title": "Email",
"required": true,
"fieldType": "SHORT_ANSWER"
},
{
"oid": "pronouns",
"title": "Pronouns",
"required": false,
"fieldType": "PRONOUN"
},
{
"oid": "current_company",
"title": "Current company",
"required": false,
"fieldType": "SHORT_ANSWER"
},
{
"oid": "phone_number",
"title": "Phone number",
"required": true,
"fieldType": "PHONE_NUMBER"
},
{
"oid": "location",
"title": "Location (city only)",
"required": true,
"fieldType": "SHORT_ANSWER"
},
{
"oid": "linkedin_link",
"title": "LinkedIn link",
"required": false,
"fieldType": "SHORT_ANSWER"
},
{
"oid": "resume",
"title": "Resume",
"required": true,
"fieldType": "FILE"
},
{
"oid": "cover_letter",
"title": "Cover letter",
"required": false,
"fieldType": "FILE"
}
],
"customQuestions": {
"fields": [
{
"oid": "first_name",
"title": "First name",
"required": true,
"fieldData": {},
"fieldType": "SHORT_ANSWER"
},
{
"oid": "last_name",
"title": "Last name",
"required": true,
"fieldData": {},
"fieldType": "SHORT_ANSWER"
},
{
"oid": "email",
"title": "Email",
"required": true,
"fieldData": {},
"fieldType": "SHORT_ANSWER"
},
{
"oid": "pronouns",
"title": "Pronouns",
"required": false,
"fieldData": {},
"fieldType": "PRONOUN"
},
{
"oid": "current_company",
"title": "Current company",
"required": false,
"fieldData": {},
"fieldType": "SHORT_ANSWER"
},
{
"oid": "phone_number",
"title": "Phone number",
"required": true,
"fieldData": {},
"fieldType": "PHONE_NUMBER"
},
{
"oid": "location",
"title": "Location (city only)",
"required": true,
"fieldData": {},
"fieldType": "SHORT_ANSWER"
},
{
"oid": "linkedin_link",
"title": "LinkedIn link",
"required": false,
"fieldData": {},
"fieldType": "SHORT_ANSWER"
},
{
"oid": "resume",
"title": "Resume",
"required": true,
"fieldData": {},
"fieldType": "FILE"
},
{
"oid": "cover_letter",
"title": "Cover letter",
"required": false,
"fieldData": {},
"fieldType": "FILE"
}
]
},
"additionalQuestions": null
},
"hasAIEvaluationsEnabled": false,
"eeocQuestionnaireEnabled": true,
"applicationConfirmationTemplate": "68c1a1ae94b69622be8d48ff",
"eeocQuestionnaireEnabledForJobPost": true
},
"detail_meta": {
"url": "https://ats.rippling.com/api/v2/board/sanas/jobs/ad6ba237-eb74-4755-bf0d-027edbc3222c",
"http_status": 200,
"content_type": "application/json",
"response_bytes": 20039
},
"detail_errors": []
}Get this page with API
Rendered from the bluedoor Job Postings API. Reproduce it:
GET https://api.bluedoor.sh/job-postings/v1/jobs/149e01a361cf423bf3a92184109f3f00d34f69a4?include=descriptionJSONGET https://api.bluedoor.sh/job-postings/v1/orgs/83ad35d8-903f-4812-a8a1-7e0502248692JSONGET https://api.bluedoor.sh/job-postings/v1/sources/1fc1335f-581e-4138-ae2e-6e3d6c790876JSONGET https://api.bluedoor.sh/job-postings/v1/jobs/149e01a361cf423bf3a92184109f3f00d34f69a4/eventsJSON