bluedoor data·Job Postings API·bluedoor.sh ↗

HomeCompaniesfalSoftware Engineer, Infrastructure

Software Engineer, Infrastructure

Fal · San Francisco · Remote · Active · $180,000 / year · Greenhouse

Job facts

FieldValue
CompanyFal
TitleSoftware Engineer, Infrastructure
Normalized title-
Department / teamEngineering
LocationSan Francisco, CA, United States
Work modelRemote / Remote
Employment type-
Salary$180,000 / year
Statusactive
ATS providerGreenhouse
Posted / first seen2026-02-23 / 2026-05-29
Changed / last seen2026-05-29 / 2026-06-06

Related slices

PageWhat it containsOpen
Company jobsActive postings from Fal.Open
Company breakdownsRole, location, ATS, and work model facets for this company.Open
ATS provider jobsActive postings observed through Greenhouse.Open
Provider filtered searchThe same provider as a filtered job collection.Open
City jobsActive postings in San Francisco.Open
Department jobsActive postings in Engineering.Open
Work model jobsActive Remote postings.Open
Lifecycle eventsOpen, update, close, and reopen events for this posting.Open
Original postingCanonical source or apply URL captured from the ATS.Open

Linked records

CompanyFal
Source00cad292-f66b-476d-a9f7-2cd97898a6d0
ATS providerGreenhouse

Description

You are a hands-on engineer who builds the software and processes that keep a large fleet of GPU servers healthy and productive. You write systems and tooling for managing 1000s of servers including provisioning, health monitoring, error detection, and recovery — and when something breaks that automation can’t fix, you drive resolution with partners. Key responsibilities Build and maintain Python fleet tracking system that manages the full lifecycle of servers including contracting and procurement, target use, pricing, availability, health, RMAs, etc Build server management tooling that automates provisioning, health checks, GPU diagnostics, recovery and alerting Create and maintain metrics, dashboards, and alerting for hardware health across the fleet (GPU errors, disk failures, network issues, thermals) Leverage AI to an extreme level to build tools and automate alerting and recovery Implement and enforce OS-level security: hardening baselines, SELinux/AppArmor policies, SSH key management, vulnerability scanning, and compliance automation Manage and optimize distributed and local storage systems supporting model weights, checkpoints, and ephemeral scratch: NVMe arrays, NFS, parallel file systems, and object storage Tune Linux systems for AI workloads: kernel parameters, NUMA topology, CPU pinning, hugepages, I/O schedulers, and GPU driver stack optimization (NVIDIA drivers, CUDA, container runtimes) Develop a suite of automated error detection and recovery processes Work with partners to solve technical issues Requirements 3+ years experience managing bare-metal and cloud based server fleets at scale (100+ nodes) Strong software engineering skills in Python; you write production tooling, not scripts Deep Linux systems knowledge: boot process, kernel tuning, networking, storage, systemd, cgroups, namespaces, performance profiling Strong experience with configuration management and infrastructure-as-code: Ansible, Terraform, cloud-init Solid understanding of storage technologies: LVM, RAID, NVMe, NFS, Lustre or GPFS, and Linux I/O stack tuning Familiarity with hardware diagnostics and failure modes (GPUs, NVMe, NICs, memory) Experience building internal tools or dashboards for infrastructure visibility Excellent communication and ability to drive technical decisions across teams Self-starter who executes quickly, takes ownership, and constantly seeks improvement Nice to have Familiarity with network configuration and diagnostics (VLAN, VXLAN, ECMP, BGP, tcpdump) Experience with NVIDIA GPU infrastructure: driver management, health monitoring, DCGM, NVLink/NVSwitch diagnostics, RDMA, InfiniBand/RoCEv2 Experience with AMD GPUs Experience with bare metal and VM provisioning (PXE/iPXE, Kickstart, libvirt, Qemu/KVM) Experience with compliance frameworks relevant to cloud providers (SOC 2, ISO 27001) Compensation $180,000-250,000 plus equity + benefits Location San Francisco, CA (we are open to remote in the US for Senior and Staff levels) What we offer at fal Interesting and challenging work A lot of learning and growth opportunities We are offering relocation assistance to San Francisco. We offer relocation assistance to San Francisco. Health, dental, and vision insurance (US) Regular team events and offsites

Full job record

Job ID7e9aa827104a520f7f1baaeb490a87fa4010fb61
Org ID144222fa-e13c-4bb3-be7f-89ec0ca39bac
Source ID00cad292-f66b-476d-a9f7-2cd97898a6d0
Board ID00cad292-f66b-476d-a9f7-2cd97898a6d0
Providergreenhouse
Provider Job Key4146035009
TitleSoftware Engineer, Infrastructure
Normalized Title
Statusactive
Activeyes
Location TextSan Francisco
DepartmentEngineering
Team
Employment Type
Workplace Typeremote
Remote Policyremote
CountryUnited States
RegionCA
CitySan Francisco
Salary RawCompensation $180,000-250,000 plus equity + benefits Location San Francisco, CA (we are open to remot
Salary Min180,000
Salary Max
Salary CurrencyUSD
Salary Periodyear
Source URLhttps://job-boards.greenhouse.io/fal/jobs/4146035009
Apply URLhttps://job-boards.greenhouse.io/fal/jobs/4146035009
First Seen At2026-05-29 22:55:27Z
Last Seen At2026-06-06 18:41:04Z
Last Checked At2026-06-06 18:41:04Z
Last Changed At2026-05-29 22:55:27Z
Inactive At
Source Posted At2026-02-23 18:48:14Z
Source Updated At2026-05-21 19:54:05Z
Raw Payload Uris3://job-postings-prod-raw-590183727216/raw/provider=greenhouse/board=fal/date=2026-06-06/2026-06-06T18-41-04-423Z-86bb1d74f3eb248fbdeda47bd35749162de1e19b95a4b9eec044932aff19275e.json
Event Fields
{
  "content_hash": "9375772ba0b1d610f7bf8336cb40291e510c97411cc9c74d9275711b084f2021",
  "source_hash": "de85d23f2401640bf1da9df7fae15f21309348e4fdec0bc5e2d82bd7233f8bf3",
  "last_changed_at": "2026-05-29T22:55:27.738Z",
  "active_status": "active"
}
Parsed Structured
{
  "language": "en",
  "location": {
    "raw": "San Francisco",
    "city": "San Francisco",
    "region": "CA",
    "country": "United States",
    "is_remote": true,
    "confidence": 0.75
  },
  "salary_max": null,
  "salary_min": 180000,
  "inferred_at": "2026-06-06T18:41:04.573Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco",
      "city": "San Francisco",
      "region": "CA",
      "country": "United States",
      "is_remote": true,
      "confidence": 0.75
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": "remote",
  "salary_period": "year",
  "workplace_type": "remote",
  "salary_currency": "USD"
}
Extensions
{}
Native Structured
{
  "title": "Software Engineer, Infrastructure",
  "offices": [
    {
      "id": 4007037009,
      "name": "SF Office",
      "location": "San Francisco, California, United States",
      "child_ids": [],
      "parent_id": null
    }
  ],
  "language": "en",
  "location": {
    "name": "San Francisco"
  },
  "metadata": [],
  "updated_at": "2026-05-21T15:54:05-04:00",
  "departments": [
    {
      "id": 4006994009,
      "name": "Engineering",
      "child_ids": [],
      "parent_id": null
    }
  ],
  "company_name": "fal",
  "requisition_id": 4093358009,
  "first_published": "2026-02-23T13:48:14-05:00",
  "application_deadline": null
}
Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/7e9aa827104a520f7f1baaeb490a87fa4010fb61?include=descriptionJSON
GET https://api.bluedoor.sh/job-postings/v1/orgs/144222fa-e13c-4bb3-be7f-89ec0ca39bacJSON
GET https://api.bluedoor.sh/job-postings/v1/sources/00cad292-f66b-476d-a9f7-2cd97898a6d0JSON
GET https://api.bluedoor.sh/job-postings/v1/jobs/7e9aa827104a520f7f1baaeb490a87fa4010fb61/eventsJSON