Home › Companies › Ifm Us › HPC Engineer
HPC Engineer
Ifm Us · Sunnyvale, CA · On Site · Active · $150,000–$300,000 / year · Lever
Job facts
| Field | Value |
|---|---|
| Company | Ifm Us |
| Title | HPC Engineer |
| Normalized title | - |
| Department / team | Engineering |
| Location | Sunnyvale, CA, United States |
| Work model | On Site |
| Employment type | Full Time |
| Salary | $150,000–$300,000 / year |
| Status | active |
| ATS provider | Lever |
| Posted / first seen | 2026-06-01 / 2026-06-02 |
| Changed / last seen | 2026-06-02 / 2026-06-06 |
Related slices
| Page | What it contains | Open |
|---|---|---|
| Company jobs | Active postings from Ifm Us. | Open |
| Company breakdowns | Role, location, ATS, and work model facets for this company. | Open |
| ATS provider jobs | Active postings observed through Lever. | Open |
| Provider filtered search | The same provider as a filtered job collection. | Open |
| City jobs | Active postings in Sunnyvale. | Open |
| Work model jobs | Active On Site postings. | Open |
| Lifecycle events | Open, update, close, and reopen events for this posting. | Open |
| Original posting | Canonical source or apply URL captured from the ATS. | Open |
Linked records
| Company | Ifm Us |
| Source | 4d111a77-38db-4b88-84a8-24f761a495a9 |
| ATS provider | Lever |
Description
About MBZUAI
The Institute for Foundation Models (IFM) operates some of the world's largest AI supercomputing environments.
Position Summary
This role provides operational coverage during Abu Dhabi overnight hours and serves as a primary point of contact for infrastructure monitoring, incident triage, researcher support, and production operations.
Benefits Include
*Comprehensive medical, dental, and vision benefits
*Bonus
*401K Plan
*Generous paid time off, sick leave and holidays
*Paid Parental Leave
*Employee Assistance Program
*Life insurance and disability
Responsibilities
• Monitor health, performance, and availability of large-scale GPU clusters.
• Respond to incidents and perform first-level triage.
• Support researchers and troubleshoot job failures.
• Execute operational runbooks and recovery procedures.
• Validate cluster deployments, upgrades, and maintenance activities.
• Track infrastructure utilization and operational metrics.
• Develop automation and monitoring tools.
• Contribute to documentation and reporting.
Education
Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Information Technology, Electrical Engineering, Mathematics, Physics, or related disciplines.
Experience
• 2+ years in Linux systems administration, SRE, DevOps, cloud operations, HPC, or infrastructure operations.
• Strong Linux troubleshooting skills.
• Experience with scripting using Python or Bash.
Preferred Qualifications
• Slurm.
• GPU infrastructure.
• AWS, Azure, or GCP.
• Grafana, Prometheus, Datadog, or similar tools.
• Containers and Kubernetes.
• AI/ML infrastructure exposure.
• Research computing environments.
Full job record
| Job ID | 6a3b92543451b36cc063812771a1b602536f8bbe |
| Org ID | bb7fb7ce-62b9-4ed3-9327-02a3c7b7e5d0 |
| Source ID | 4d111a77-38db-4b88-84a8-24f761a495a9 |
| Board ID | 4d111a77-38db-4b88-84a8-24f761a495a9 |
| Provider | lever |
| Provider Job Key | 4c44a0f8-5179-41b3-a0b1-fd0902da9e5b |
| Title | HPC Engineer |
| Normalized Title | — |
| Status | active |
| Active | yes |
| Location Text | Sunnyvale, CA |
| Department | — |
| Team | Engineering |
| Employment Type | Full-time |
| Workplace Type | on_site |
| Remote Policy | — |
| Country | United States |
| Region | CA |
| City | Sunnyvale |
| Salary Raw | USD 150000-300000 per-year-salary |
| Salary Min | 150,000 |
| Salary Max | 300,000 |
| Salary Currency | USD |
| Salary Period | year |
| Source URL | https://jobs.lever.co/ifm-us/4c44a0f8-5179-41b3-a0b1-fd0902da9e5b |
| Apply URL | https://jobs.lever.co/ifm-us/4c44a0f8-5179-41b3-a0b1-fd0902da9e5b/apply |
| First Seen At | 2026-06-02 10:41:24Z |
| Last Seen At | 2026-06-06 20:14:05Z |
| Last Checked At | 2026-06-06 20:14:05Z |
| Last Changed At | 2026-06-02 10:41:24Z |
| Inactive At | — |
| Source Posted At | 2026-06-01 18:02:25Z |
| Source Updated At | — |
| Raw Payload Uri | s3://job-postings-prod-raw-590183727216/raw/provider=lever/board=ifm-us/date=2026-06-06/2026-06-06T20-14-04-180Z-dba991fe17ae8dd61e2db3cfb8af8d8d910a473e10cffaf0af12daa6be784167.json |
Event Fields
{
"content_hash": "3391b68d0b908cc6340c959657a041305518d6533535bd49e96a40bcc864e14f",
"source_hash": "ff7b3a9e62e7da64f2f370b6c14472ec17a6f9a86379d3299e136306eb3515bc",
"last_changed_at": "2026-06-02T10:41:24.749Z",
"active_status": "active"
}Parsed Structured
{
"language": "en",
"location": {
"raw": "Sunnyvale, CA",
"city": "Sunnyvale",
"region": "CA",
"country": "United States",
"is_remote": false,
"confidence": 0.9
},
"salary_max": 300000,
"salary_min": 150000,
"inferred_at": "2026-06-06T20:14:05.507Z",
"launch_scope": {
"reason": "english_us_canada",
"included": true,
"language": "en",
"location": {
"raw": "Sunnyvale, CA",
"city": "Sunnyvale",
"region": "CA",
"country": "United States",
"is_remote": false,
"confidence": 0.9
},
"countries": [
"United States"
]
},
"remote_policy": null,
"salary_period": "year",
"workplace_type": "on_site",
"salary_currency": "USD"
}Extensions
{}Native Structured
{
"lists": [
{
"text": "Responsibilities",
"content": "<div><span style=\"font-size: 11.0pt; line-height: 115%; font-family: Cambria, serif;\">• Monitor health, performance, and availability of large-scale GPU clusters.<br>• Respond to incidents and perform first-level triage.<br>• Support researchers and troubleshoot job failures.<br>• Execute operational runbooks and recovery procedures.<br>• Validate cluster deployments, upgrades, and maintenance activities.<br>• Track infrastructure utilization and operational metrics.<br>• Develop automation and monitoring tools.<br>• Contribute to documentation and reporting.</span></div>"
},
{
"text": "Education",
"content": "<div><span style=\"font-size: 11.0pt; line-height: 115%; font-family: Cambria, serif;\">Bachelor's degree in Computer Science, Computer Engineering, Software Engineering, Information Technology, Electrical Engineering, Mathematics, Physics, or related disciplines.<br><br></span></div>"
},
{
"text": " Experience",
"content": "<div><span style=\"font-size: 11.0pt; line-height: 115%; font-family: Cambria, serif;\">• 2+ years in Linux systems administration, SRE, DevOps, cloud operations, HPC, or infrastructure operations.<br>• Strong Linux troubleshooting skills.<br>• Experience with scripting using Python or Bash.</span></div>"
},
{
"text": " Preferred Qualifications",
"content": "<div><span style=\"font-size: 11.0pt; line-height: 115%; font-family: Cambria, serif;\">• Slurm.<br>• GPU infrastructure.<br>• AWS, Azure, or GCP.<br>• Grafana, Prometheus, Datadog, or similar tools.<br>• Containers and Kubernetes.<br>• AI/ML infrastructure exposure.<br>• Research computing environments.</span></div>"
}
],
"country": "US",
"createdAt": 1780336945636,
"updatedAt": null,
"categories": {
"team": "Engineering",
"location": "Sunnyvale, CA",
"commitment": "Full-time",
"allLocations": [
"Sunnyvale, CA"
]
},
"salaryRange": {
"max": 300000,
"min": 150000,
"currency": "USD",
"interval": "per-year-salary"
},
"workplaceType": "onsite"
}Get this page with API
Rendered from the bluedoor Job Postings API. Reproduce it:
GET https://api.bluedoor.sh/job-postings/v1/jobs/6a3b92543451b36cc063812771a1b602536f8bbe?include=descriptionJSONGET https://api.bluedoor.sh/job-postings/v1/orgs/bb7fb7ce-62b9-4ed3-9327-02a3c7b7e5d0JSONGET https://api.bluedoor.sh/job-postings/v1/sources/4d111a77-38db-4b88-84a8-24f761a495a9JSONGET https://api.bluedoor.sh/job-postings/v1/jobs/6a3b92543451b36cc063812771a1b602536f8bbe/eventsJSON