Home › Companies › Montauk Capital › Head of Inference, Stealth Edge AI Co

Head of Inference, Stealth Edge AI Co

Montauk Capital · New York City · Hybrid · Active · Ashby

Job facts

Field	Value
Company	Montauk Capital
Title	Head of Inference, Stealth Edge AI Co
Normalized title	-
Department / team	Portfolio / Portfolio, Thermo Compute
Location	New York City, NY, United States
Work model	Hybrid / Hybrid
Employment type	Full Time
Salary	-
Status	active
ATS provider	Ashby
Posted / first seen	— / 2026-05-29
Changed / last seen	2026-05-29 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Montauk Capital.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Ashby.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in New York City.	Open
Department jobs	Active postings in Portfolio.	Open
Work model jobs	Active Hybrid postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Montauk Capital
Source	6b825b94-9f7f-4bef-89d3-58ddf066f64a
ATS provider	Ashby

Description

Head of Inference Full Time, Remote, NYC Preferred (US Based) About Montauk Capital Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards electrified, intelligent technologies reshaping industries and driving unprecedented demand for energy. Our team combines deep investing acumen with decades of operating experience to give founders the strategic clarity and hands-on support that accelerates the building of enduring companies of consequence. About Stealth Edge AI Co Co-founded by Montauk Capital, Stealth Edge AI Co is a pre-seed venture specialized in modular, metro-edge AI capabilities. By leveraging existing infrastructure for inference deployment, Edge AI provides low-latency, SLA-guaranteed performance across diverse GPU SKUs and colocation environments. Our technology intelligently routes traffic based on demand proximity and real-world network limitations, bypassing the heavy power and infrastructure requirements of traditional hyperscalers. Currently initiating operations with pilot nodes in NYC, we are executing a city-by-city expansion strategy with plans for a broader multi-metro rollout. About the Role We are seeking a visionary and execution-oriented Head of Inference. You'll define the inference architecture, make foundational decisions, build the first POC, and own this domain end to end alongside the CEO. You will be a senior, hands-on technical leader and the technical authority on inference in the room. You’ll own the key technical decisions, and will be the internal and external expert on inference. You will own the core inference capability driving the platform and customer experience, and have a strong voice over the technical foundation of the company. You’ll evolve the vision into a viable proof of concept, building the practical system to then design and implement distributed inference systems. Alongside the CEO, you’ll represent the company with top-tier partners, early customers and investors, and will own this domain end to end. In addition to the CEO, you will have the support of a team of strong advisors, and the initial founding team. What You’ll Do Create the inference strategy and define the inference architecture for Edge AI Own the inference serving layer end-to-end: vLLM, TensorRT-LLM, Triton, or equivalent Build a credible POC fast — proves the platform works to NVIDIA, cloud providers, and customers Drive cost-per-token optimization Optimize GPU utilization, KV-cache management, and batching for production workloads Own observability and reliability SLAs Build distributed inference pipelines across multi-GPU, multi-node edge deployments Set performance baselines and SLAs for inference latency and throughput, plus observability and performance SLA’s Define quantization strategy Translate complex inference requirements for infrastructure designs Define the software access layer architecture and oversee integration efforts Engage credibly with investors, partners, and technical stakeholders, represent the company externally What You’ll Bring You have a passion for inference and a background as a hands-on technical builder who has directly implemented inference systems before, ideally in production or near-production environments. Deep knowledge and are excited about model serving, and the practical engineering required to make an inference system work on real hardware. You can take a vision and initial concept and translate it into a viable POC quickly and are comfortable making foundational technical decisions quickly, in ambiguity, and building first of a kind. If inference is your craft and you've built systems in production, we want to talk. Production inference serving — vLLM, TensorRT-LLM, Triton Inference Server, or equivalent distributed at scale Quantization, SGLang, containerization, cost-per-token Observability tooling:distributed tracing, latency profiling, alerting. Instrument and debug complex distributed systems with a focus on building world-class observability and debuggability tools C++/CUDA/Rust GPU utilization and CUDA kernel optimization — has pushed hardware to its limits Batching, KV-cache, speculative decoding expertise Scale systems using Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference Has built a serving system that NVIDIA and cloud providers respect Model deployment and serving Systems engineering Technical leadership experience, either over teams or outcomes Startup / 0→1 DNA: You ship fast and communicate clearly Why Join Us Category-Defining Opportunity: Solving the AI inference bottleneck without the burden of power and infrastructure constraints Own the metro edge inference across heterogeneous, disparate compute nodes Massive Market Opportunity: AI spending projected to exceed hundreds of billions annually, 54GW of AI Inference demand expected by 2030 Studio Support: Leverage Montauk Capital's resources, network, and operational expertise during critical early stages Competitive compensation + equity: True ownership over what you build

Full job record

Job ID	77b7bb63885c14437b42e4f4be71988fe03d7af8
Org ID	cb5c56ae-6850-4cb6-ac27-edfaff3a4e3f
Source ID	6b825b94-9f7f-4bef-89d3-58ddf066f64a
Board ID	6b825b94-9f7f-4bef-89d3-58ddf066f64a
Provider	ashby
Provider Job Key	041cc206-d3ef-46fd-b9cf-95eab0b26922
Title	Head of Inference, Stealth Edge AI Co
Normalized Title	—
Status	active
Active	yes
Location Text	New York City
Department	Portfolio
Team	Portfolio, Thermo Compute
Employment Type	full_time
Workplace Type	hybrid
Remote Policy	hybrid
Country	United States
Region	NY
City	New York City
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://jobs.ashbyhq.com/montauk-capital/041cc206-d3ef-46fd-b9cf-95eab0b26922
Apply URL	https://jobs.ashbyhq.com/montauk-capital/041cc206-d3ef-46fd-b9cf-95eab0b26922/application
First Seen At	2026-05-29 06:31:40Z
Last Seen At	2026-06-06 20:38:48Z
Last Checked At	2026-06-06 20:38:48Z
Last Changed At	2026-05-29 06:31:40Z
Inactive At	—
Source Posted At	—
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=montauk-capital/date=2026-06-06/2026-06-06T20-38-46-534Z-0d67855f04fbe456edd363a6378bd701175b2a9f940f5eac3ee31d0d6c4a3ab8.json

Event Fields

{
  "content_hash": "ae1db01cfa3d6ef0e146aeec4c217d2f627554f2f8c2f6b0430d75d7606c3f55",
  "source_hash": "cbaa4e5427101acd6abbe2a8b0cbf6c7d4103abdf5442b4190ebc643647a086f",
  "last_changed_at": "2026-05-29T06:31:40.644Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "New York City",
    "city": "New York City",
    "region": "NY",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.75
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T20:38:48.389Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "New York City",
      "city": "New York City",
      "region": "NY",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.75
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": "hybrid",
  "salary_period": null,
  "workplace_type": "hybrid",
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "id": "041cc206-d3ef-46fd-b9cf-95eab0b26922",
  "team": "Portfolio, Thermo Compute",
  "title": "Head of Inference, Stealth Edge AI Co",
  "jobUrl": "https://jobs.ashbyhq.com/montauk-capital/041cc206-d3ef-46fd-b9cf-95eab0b26922",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/montauk-capital/041cc206-d3ef-46fd-b9cf-95eab0b26922/application",
  "isListed": true,
  "isRemote": false,
  "location": "New York City",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Portfolio",
  "publishedAt": null,
  "workplaceType": "Hybrid",
  "employmentType": "FullTime",
  "secondaryLocations": []
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/77b7bb63885c14437b42e4f4be71988fe03d7af8?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/cb5c56ae-6850-4cb6-ac27-edfaff3a4e3fJSON

GET https://api.bluedoor.sh/job-postings/v1/sources/6b825b94-9f7f-4bef-89d3-58ddf066f64aJSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/77b7bb63885c14437b42e4f4be71988fe03d7af8/eventsJSON

Docs · Get an API key