Home › Companies › Crusoe › Staff Production Engineer (Operational Excellence)

Staff Production Engineer (Operational Excellence)

Crusoe · San Francisco, CA - US · On Site · Active · $209,000–$253,000 / year · Ashby

Job facts

Field	Value
Company	Crusoe
Title	Staff Production Engineer (Operational Excellence)
Normalized title	-
Department / team	Cloud Engineering / Cloud Engineering, Cloud Availability, Cloud Production Engineering
Location	San Francisco, United States
Work model	On Site
Employment type	Full Time
Salary	$209,000–$253,000 / year
Status	active
ATS provider	Ashby
Posted / first seen	— / 2026-05-29
Changed / last seen	2026-06-06 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Crusoe.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Ashby.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in San Francisco.	Open
Department jobs	Active postings in Cloud Engineering.	Open
Work model jobs	Active On Site postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Crusoe
Source	50724c53-15ff-44ea-9170-cead94f3ffae
ATS provider	Ashby

Description

Crusoe is on a mission to accelerate the abundance of energy and intelligence . As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster. We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI. We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services. If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe. About This Role: Crusoe is building the most reliable, energy-efficient, AI-optimized cloud platform — and Production Engineering sits at the heart of that mission. As a Staff Production Engineer focused on Operational Excellence, you will help ensure the reliability, scalability, and performance of Crusoe's GPU cloud that powers next-generation AI workloads. This role is ideal for senior engineers who enjoy solving complex production problems, leading reliability strategy across large-scale distributed systems, and building automation that keeps infrastructure running smoothly. You'll play a key role in strengthening the operational foundation of Crusoe's cloud while helping scale infrastructure that supports demanding AI and HPC workloads. You'll partner closely with Production Engineers, infrastructure teams, and platform engineers to improve system reliability, reduce operational toil, and drive continuous improvements across Crusoe's rapidly growing GPU cloud. What You'll Be Working On: Lead cross-functional efforts to define and evolve availability metrics for Crusoe's cloud platform, including establishing, measuring, and improving SLIs and SLOs Drive production incident response, diagnosing and resolving service disruptions while leading post-incident reviews and root cause analysis Architect, operate, and improve observability across Crusoe's infrastructure using tools such as Prometheus, Grafana, Alertmanager, and OpenTelemetry Identify reliability risks, performance bottlenecks, and early indicators of potential production issues across distributed systems Design and develop automation and tooling that reduces operational toil, improves recovery times, and enables self-healing infrastructure Partner with compute, networking, storage, and platform teams to strengthen service resilience and disaster recovery capabilities Define and champion operational processes, knowledge sharing, and reliability best practices across the engineering organization Mentor and grow junior and mid-level engineers, helping build technical depth across the team What You'll Bring to the Team: Bachelor's degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience) 8+ years of experience in Production Engineering, SRE, or large-scale infrastructure operations Demonstrated experience supporting GPU workloads, HPC environments, or latency/throughput-sensitive distributed systems Previous experience in Infrastructure roles building or managing compute, storage or networking platforms Deep knowledge of Linux/Unix systems, including debugging complex issues across kernel and user space Strong understanding of modern cloud infrastructure fundamentals including Kubernetes, distributed systems, virtualization, and cloud platforms (AWS/GCP) Proven track record with incident management practices and reliability frameworks (SRE, ITIL, or similar) Hands-on experience with monitoring and observability tools such as Prometheus and Grafana Experience with infrastructure-as-code and configuration management tools such as Terraform or Ansible Proficiency in scripting or programming with languages such as Go, Python, C, or C++ Exceptional communication skills and the ability to influence and collaborate across engineering teams Ability to remain calm and effective while troubleshooting complex issues in high-impact production environments A growth mindset and strong commitment to reliability engineering, automation, and operational excellence Bonus Points: Experience leading Kubernetes or container orchestration platforms at scale Exposure to change management processes, operational readiness reviews, or structured root cause analysis Experience designing self-healing systems, automated remediation, or event-driven operational tooling Interest in scaling AI or HPC infrastructure and solving reliability challenges in GPU-heavy environments Passion for mentorship, growing teams, and developing deep expertise in Production Engineering Benefits: Industry competitive pay Restricted Stock Units in a fast growing, well-funded technology company Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents Employer contributions to HSA accounts Paid Parental Leave Paid life insurance, short-term and long-term disability Teladoc 401(k) with a 100% match up to 4% of salary Generous paid time off and holiday schedule Cell phone reimbursement Tuition reimbursement Subscription to the Calm app MetLife Legal Company paid commuter benefit; $300 per month Compensation: Compensation will be paid in the range of $209,000 – $253,000 + Bonus. Restricted Stock Units are included in all offers. Compensation will be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data. Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Full job record

Job ID	f04dad4b48e35dfd8434f678e5112187886183d3
Org ID	2fcf95a9-48bb-4786-9a15-e88511d5ef14
Source ID	50724c53-15ff-44ea-9170-cead94f3ffae
Board ID	50724c53-15ff-44ea-9170-cead94f3ffae
Provider	ashby
Provider Job Key	d8019cfe-995a-40c3-bce0-97f368f3d454
Title	Staff Production Engineer (Operational Excellence)
Normalized Title	—
Status	active
Active	yes
Location Text	San Francisco, CA - US
Department	Cloud Engineering
Team	Cloud Engineering, Cloud Availability, Cloud Production Engineering
Employment Type	full_time
Workplace Type	on_site
Remote Policy	—
Country	United States
Region	—
City	San Francisco
Salary Raw	Compensation: Compensation will be paid in the range of $209,000 – $253,000 + Bonus
Salary Min	209,000
Salary Max	253,000
Salary Currency	USD
Salary Period	year
Source URL	https://jobs.ashbyhq.com/Crusoe/d8019cfe-995a-40c3-bce0-97f368f3d454
Apply URL	https://jobs.ashbyhq.com/Crusoe/d8019cfe-995a-40c3-bce0-97f368f3d454/application
First Seen At	2026-05-29 05:51:49Z
Last Seen At	2026-06-06 20:17:33Z
Last Checked At	2026-06-06 20:17:33Z
Last Changed At	2026-06-06 09:11:51Z
Inactive At	—
Source Posted At	—
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=Crusoe/date=2026-06-06/2026-06-06T20-14-22-954Z-d84cbb82bbf3719a716d941e88132f0006f245932f96812ba9d3cfc110356043.json

Event Fields

{
  "content_hash": "d542ec769d9a957e40b7c2f62690e58b154f6749ffe6830ef0cd50fe467d6160",
  "source_hash": "6fec8a47bcbb9d0ba131a57b9f4d1539e238d4a35fc41edc64452f0c7adccfa8",
  "last_changed_at": "2026-06-06T09:11:51.378Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "San Francisco, CA - US",
    "city": "San Francisco",
    "region": null,
    "country": "United States",
    "is_remote": false,
    "confidence": 0.95
  },
  "salary_max": 253000,
  "salary_min": 209000,
  "inferred_at": "2026-06-06T20:17:33.694Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco, CA - US",
      "city": "San Francisco",
      "region": null,
      "country": "United States",
      "is_remote": false,
      "confidence": 0.95
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": "year",
  "workplace_type": "on_site",
  "salary_currency": "USD"
}

Extensions

{}

Native Structured

{
  "id": "d8019cfe-995a-40c3-bce0-97f368f3d454",
  "team": "Cloud Engineering, Cloud Availability, Cloud Production Engineering",
  "title": "Staff Production Engineer (Operational Excellence)",
  "jobUrl": "https://jobs.ashbyhq.com/Crusoe/d8019cfe-995a-40c3-bce0-97f368f3d454",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/Crusoe/d8019cfe-995a-40c3-bce0-97f368f3d454/application",
  "isListed": true,
  "isRemote": false,
  "location": "San Francisco, CA - US",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Cloud Engineering",
  "publishedAt": null,
  "workplaceType": "OnSite",
  "employmentType": "FullTime",
  "secondaryLocations": [
    {
      "location": "Sunnyvale, CA - US"
    }
  ]
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/f04dad4b48e35dfd8434f678e5112187886183d3?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/2fcf95a9-48bb-4786-9a15-e88511d5ef14JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/50724c53-15ff-44ea-9170-cead94f3ffaeJSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/f04dad4b48e35dfd8434f678e5112187886183d3/eventsJSON

Docs · Get an API key