Home › Companies › Hyperbolic › Senior Site Reliability Engineer

Senior Site Reliability Engineer

Hyperbolic · San Francisco, CA · Active · Ashby

Job facts

Field	Value
Company	Hyperbolic
Title	Senior Site Reliability Engineer
Normalized title	-
Department / team	Engineering / Engineering
Location	San Francisco, CA, United States
Work model	-
Employment type	Full Time
Salary	-
Status	active
ATS provider	Ashby
Posted / first seen	— / 2026-05-29
Changed / last seen	2026-05-29 / 2026-06-18

Related slices

Page	What it contains	Open
Company jobs	Active postings from Hyperbolic.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Ashby.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in San Francisco.	Open
Department jobs	Active postings in Engineering.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Hyperbolic
Source	507c4451-8e1e-4a37-a4a9-4c45eb132483
ATS provider	Ashby

Description

Who We Are Hyperbolic Labs is on a mission to democratize AI by breaking down the barriers to computing power with our Open-Access AI Cloud. By aggregating computing resources across the globe, we offer an innovative GPU marketplace and AI inference service that promise affordability and accessibility for all. As pioneers at the intersection of AI and open-source technology, we believe in an open future where AI innovation is limited only by imagination, not by access to resources. We're looking for forward-thinking individuals who share our passion for making AI universally accessible, secure, and affordable. Join us in building a platform that empowers innovators everywhere to turn their visionary AI projects into reality. As we prepare for growth after our Series A, our team — led by co-founders with PhDs in AI, Math, and Computer Science — is poised to redefine computing. About the Role We're seeking a Site Reliability Engineer to ensure Hyperbolic's GPU marketplace and AI infrastructure operate with exceptional reliability, performance, and security. As an aggregator of compute resources from hundreds of global suppliers, our SLOs, trust, and economic efficiency are product-critical. You'll be responsible for defining and maintaining service level objectives for job success rates, building robust incident response systems, managing capacity across our distributed GPU network, and implementing secure rollout and rollback mechanisms that keep our platform running smoothly 24/7. In this role, you'll establish the reliability standards that define customer trust in our platform, design monitoring and alerting systems that provide deep visibility into our infrastructure, build automation for capacity management and resource allocation, lead incident response and post-mortem processes, and work closely with engineering teams to improve system resilience. You'll also focus on security and infrastructure hardening, ensuring strong isolation between tenants and suppliers, implementing key management systems, and building compliance frameworks. This is a high-impact position where your work directly influences our ability to deliver on our promise of affordable, accessible AI compute at scale. Who You Are Architected, deployed, and managed large-scale Kubernetes environments, including cluster administration, container orchestration, autoscaling, service discovery, and high-availability infrastructure to ensure reliability and scalability of mission-critical systems. Led troubleshooting and performance optimization efforts across Kubernetes-based production environments, proactively identifying system bottlenecks, automating remediation workflows, and improving overall platform stability and uptime. Strong automation mindset with experience using infrastructure-as-code, configuration management, and CI/CD pipelines Strong background in capacity planning and management, including forecasting, resource allocation, and cost optimization for distributed systems Experienced in incident response, on-call rotations, and post-mortem processes with a track record of reducing MTTR and improving system resilience Deep knowledge of deployment systems including progressive rollouts, canary deployments, feature flags, and automated rollback mechanisms Proficient in observability tools and practices including metrics, logging, tracing, and alerting systems (Prometheus, Grafana, ELK stack, or similar) Strong understanding of infrastructure security including tenant isolation, workload isolation, network segmentation, and security hardening Experience with secrets management, key management systems (KMS), certificate management, and secure credential rotation Expert in site reliability engineering with proven experience defining, monitoring, and maintaining SLOs and SLAs for production systems Knowledge of compliance frameworks and security best practices for cloud platforms (SOC 2, ISO 27001, or similar) Excellent problem-solving skills with ability to debug complex distributed systems issues under pressure Preferred Qualifications Experience operating GPU infrastructure, AI/ML platforms, or compute marketplaces at scale Background in distributed systems, peer-to-peer networks, or decentralized infrastructure Knowledge of multi-tenancy security patterns, container security, and runtime security tools Experience with chaos engineering, fault injection, and resilience testing Familiarity with cost optimization strategies for cloud infrastructure and GPU resources Experience building and operating systems with demanding uptime requirements (99.9%+ SLAs) Background at companies like AWS, Google Cloud, Azure, or fast-growing infrastructure startups Contributions to open-source reliability, observability, or security tools Hyperbolic is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Full job record

Job ID	0b5c7631e8386f25359d36cb2547b08a988fd165
Org ID	20453fd2-e103-44d1-a0d7-86a1becd3bcb
Source ID	507c4451-8e1e-4a37-a4a9-4c45eb132483
Board ID	507c4451-8e1e-4a37-a4a9-4c45eb132483
Provider	ashby
Provider Job Key	cb366294-41bb-4510-bc5b-19ce055c4643
Title	Senior Site Reliability Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	San Francisco, CA
Department	Engineering
Team	Engineering
Employment Type	full_time
Workplace Type	—
Remote Policy	—
Country	United States
Region	CA
City	San Francisco
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://jobs.ashbyhq.com/hyperbolic/cb366294-41bb-4510-bc5b-19ce055c4643
Apply URL	https://jobs.ashbyhq.com/hyperbolic/cb366294-41bb-4510-bc5b-19ce055c4643/application
First Seen At	2026-05-29 05:47:55Z
Last Seen At	2026-06-18 09:50:20Z
Last Checked At	2026-06-18 09:50:20Z
Last Changed At	2026-05-29 05:47:55Z
Inactive At	—
Source Posted At	—
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=hyperbolic/date=2026-06-18/2026-06-18T09-50-11-135Z-bcfd6e657d37cefbebe304eb7e540dd12b49048024ded71c69a901c78381f564.json

Event Fields

{
  "content_hash": "92bf3a10af2a246eb7fab137ce0e99af065fc630b9884ccfe0450874f0af5f98",
  "source_hash": "ca0f8f7e8d0fa50bcde8ebf85a55cb309999422aeb7a14c4be5f1feb2ad9b604",
  "last_changed_at": "2026-05-29T05:47:55.516Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "San Francisco, CA",
    "city": "San Francisco",
    "region": "CA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.9
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-18T09:50:20.070Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco, CA",
      "city": "San Francisco",
      "region": "CA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.9
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "id": "cb366294-41bb-4510-bc5b-19ce055c4643",
  "team": "Engineering",
  "title": "Senior Site Reliability Engineer",
  "jobUrl": "https://jobs.ashbyhq.com/hyperbolic/cb366294-41bb-4510-bc5b-19ce055c4643",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/hyperbolic/cb366294-41bb-4510-bc5b-19ce055c4643/application",
  "isListed": true,
  "isRemote": false,
  "location": "San Francisco, CA",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Engineering",
  "publishedAt": null,
  "workplaceType": null,
  "employmentType": "FullTime",
  "secondaryLocations": []
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/0b5c7631e8386f25359d36cb2547b08a988fd165?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/20453fd2-e103-44d1-a0d7-86a1becd3bcbJSON

GET https://api.bluedoor.sh/job-postings/v1/sources/507c4451-8e1e-4a37-a4a9-4c45eb132483JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/0b5c7631e8386f25359d36cb2547b08a988fd165/eventsJSON

Docs · Get an API key