bluedoor data·Job Postings API·bluedoor.sh ↗

HomeCompaniesFirecrawlResearch Engineer — Reinforcement Learning

Research Engineer — Reinforcement Learning

Firecrawl · San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10) · Remote · Active · Ashby

Job facts

FieldValue
CompanyFirecrawl
TitleResearch Engineer — Reinforcement Learning
Normalized title-
Department / teamEngineering Team / Engineering Team
LocationSan Francisco, CA, United States
Work modelRemote / Hybrid
Employment typeFull Time
Salary-
Statusactive
ATS providerAshby
Posted / first seen / 2026-05-29
Changed / last seen2026-05-29 / 2026-06-06

Related slices

PageWhat it containsOpen
Company jobsActive postings from Firecrawl.Open
Company breakdownsRole, location, ATS, and work model facets for this company.Open
ATS provider jobsActive postings observed through Ashby.Open
Provider filtered searchThe same provider as a filtered job collection.Open
City jobsActive postings in San Francisco.Open
Department jobsActive postings in Engineering Team.Open
Work model jobsActive Remote postings.Open
Lifecycle eventsOpen, update, close, and reopen events for this posting.Open
Original postingCanonical source or apply URL captured from the ATS.Open

Linked records

CompanyFirecrawl
Sourcefaea0405-4731-4b03-bc44-b8947fcec3f4
ATS providerAshby

Description

Research Engineer — Reinforcement Learning You'll bring reinforcement learning to Firecrawl's core product — building the training infrastructure, reward pipelines, and fine-tuning systems that make our models meaningfully better at extracting, understanding, and structuring web data. This isn't theoretical RL research. You'll build your own training infra, run fast experiments, ship models to production, and bridge the gap between classical RL approaches and modern LLM agent systems. If you care as much about training throughput as you do about reward design, this is the role. Salary Range: $180,000 to $290,000/year (Range shown is for U.S.-based employees in San Francisco, CA. Compensation outside the U.S. is adjusted fairly based on your country's cost of living.) Equity Range: Up to 0.15% Location: San Francisco, CA or Remote (Americas, UTC-3 to UTC-10) Job Type: Full-Time Experience: 3+ years in applied RL, ML engineering, or model training — with production systems Visa: US Citizenship/Visa required for SF; N/A for Remote About Firecrawl Firecrawl is the easiest way to extract data from the web. Developers use us to reliably convert URLs into LLM-ready markdown or structured data with a single API call. In just a year, we've hit 8 figures in ARR and 120k+ GitHub stars by building the fastest way for developers to get LLM-ready data. We're a small, fast-moving, technical team building essential infrastructure superintelligence will use to gather data on the web. We ship fast and deep. What You'll Do Build training infrastructure and reward pipelines from scratch. Design and operate the systems that train and evaluate Firecrawl's models. You'll own the full loop — data collection, reward modeling, training runs, evaluation, and deployment. You build the infra yourself because you're the one who needs it to work. Fine-tune models to achieve state-of-the-art results. Take foundation models and make them dramatically better at web data extraction, content understanding, and structured output generation. You know how to get from "decent fine-tune" to "best-in-class" and you have the patience and rigor to close that gap. Bridge LLM agents and classical RL. The most interesting problems at Firecrawl sit at the intersection of modern LLM-based agents and classical RL techniques. You'll design reward signals for agent behaviors, apply RL methods to improve multi-step agent workflows, and figure out where traditional RL approaches outperform prompting — and vice versa. Run fast experiments and iterate. You design experiments that test meaningful hypotheses, run them quickly, and make decisions based on results. You don't spend weeks on experiment infrastructure before getting a single result. Speed of iteration is a core part of how you work. Communicate clearly to non-RL people. RL can be opaque. You translate your work into language that engineers, product people, and leadership can understand and act on. You know how to explain why a reward function matters without requiring everyone to read the paper. Collaborate closely with the team. Work directly with the Search/IR-focused Research Engineer and the engineering team to connect RL improvements with search, ranking, and the broader product roadmap. What We're Looking For Builds their own training infra and reward pipelines. You don't wait for an ML platform team to set things up. You build the training loops, reward models, data pipelines, and evaluation frameworks yourself — because you understand that infra choices directly affect the quality of results. You've operated GPU clusters, managed training runs, and debugged convergence issues in production. Can fine-tune models to SOTA. You've taken models from baseline to best-in-class on tasks that matter. You understand the full fine-tuning lifecycle — data curation, training dynamics, hyperparameter sensitivity, evaluation methodology — and you have the taste to know when a model is actually good versus when the eval is flattering. Bridges LLM agents and classical RL. You're fluent in both worlds. You understand PPO, RLHF, reward modeling, and policy optimization — and you understand how modern LLM agents work, where they fail, and how RL techniques make them better. You see connections between these domains that most people miss. Production-minded. You care about whether your models work in production, not just on benchmarks. You've deployed models that serve real traffic and made hard tradeoffs between model quality, latency, and cost. Research that doesn't ship isn't research that matters here. Runs fast experiments and communicates clearly. You'd rather run three rough experiments this week than one polished one next month. When you have results, anyone on the team can understand what they mean — no decoder ring required. Backgrounds that tend to do well: RL engineers at AI labs or applied ML teams who've shipped models to production. Researchers who've done RLHF or reward modeling for LLM systems. ML engineers who've built training infrastructure at startups and cared as much about the pipeline as the model. People who've worked at the intersection of RL and language models — whether in academic labs with a production bent or at companies building agent systems. What We're NOT Looking For Pure theorists. If your best RL work lives in a paper and you've never trained a model on real data at real scale, this isn't the role. We need someone who builds and ships. Researchers who need a platform team. If you expect training infrastructure, data pipelines, and evaluation frameworks to be set up before you can be productive, you'll be frustrated here. You build the tools you need. People who only know one paradigm. Deep in classical RL but never worked with LLMs? LLM fine-tuner who's never touched RL? You'll be missing half the picture. This role requires fluency in both. Slow iterators. If your standard experiment cycle is measured in weeks, not days, you'll struggle with the pace. We need someone who can run a meaningful experiment, interpret results, and decide next steps within a day or two. Black-box communicators. If your typical update is a wall of metrics only another RL researcher can parse, this isn't the right fit. We need someone who can explain what's working, what's not, and why it matters — to people without RL PhDs. A Note On Pace We operate at an absurd level of urgency because the window for what we're building won't stay open forever. If that excites you, keep reading. If it doesn't, no hard feelings — but this role probably isn't for you. Benefits & Perks Available to all employees Salary that makes sense — $180,000–$290,000/year, based on impact, not tenure Own a piece — Up to 0.15% equity in what you're helping build Generous PTO — 15 days mandatory, anything after 24 days, just ask (holidays excluded); take the time you need to recharge Parental leave — 12 weeks fully paid, for moms and dads Wellness stipend — $100/month for the gym, therapy, massages, or whatever keeps you human Learning & Development — Expense up to $1,000/year toward anything that helps you grow professionally Team offsites — A change of scenery, minus the trust falls Sabbatical — 3 paid months off after 4 years, do something fun and new Available to US-based full-time employees Full coverage, no red tape — Medical, dental, and vision (100% for employees, 50% for spouse/kids) — no weird loopholes, just care that works Life & Disability insurance — Employer-paid short-term disability, long-term disability, and life insurance — coverage for life's curveballs Supplemental options — Optional accident, critical illness, hospital indemnity, and voluntary life insurance for extra peace of mind Doctegrity telehealth — Talk to a doctor from your couch 401(k) plan — Retirement might be a ways off, but future-you will thank you Pre-tax benefits — Access to FSAs and commuter benefits (US-only) to help your wallet out a bit Pet insurance — Because fur babies are family too Available to SF-based employees SF HQ perks — Snacks, drinks, team lunches, intense ping pong, and peak startup energy E-Bike transportation — A loaner electric bike to get you around the city, on us Interview Process Application Review — Send us your work and a quick note on why this excites you. Show us what you've trained — models, reward systems, training pipelines. Published work is great; shipped production models are better. Intro Chat (~20 min) - A quick conversation to get to know each other before we go deep. We'll talk about what you've been working on, what drew you to Firecrawl, and what you're looking for in your next role. Time for your questions too. Technical Deep Dive (~60 min) — Go deep on RL and model training work you've done: training infrastructure decisions, reward design, fine-tuning approaches, production deployment. We'll explore a live problem — how you'd apply RL to improve an LLM agent workflow at Firecrawl. We're looking for depth across classical RL and modern LLM techniques, production instincts, and fast reasoning. Founder Chat (~30 min) — Culture, pace, ownership, and how you like to work. Time for your questions too. Paid Work Trial (1–2 weeks) — Tackle a real RL/fine-tuning problem with production implications. We evaluate on technical depth, experiment velocity, and how clearly you communicate results. Decision — We move fast after the trial. If you want to bring RL to one of the most interesting applied problems in AI — making agents smarter at understanding and extracting web data at scale — this is your shot. 👉 Apply now.

Full job record

Job ID04a4a75d344a16be40fb050aa5ea7b575a5e4816
Org ID3e9b0772-325b-449d-bfe3-1424e4f1a873
Source IDfaea0405-4731-4b03-bc44-b8947fcec3f4
Board IDfaea0405-4731-4b03-bc44-b8947fcec3f4
Providerashby
Provider Job Key26abaf11-ff85-4f8d-ba44-2b6d32aae2a1
TitleResearch Engineer — Reinforcement Learning
Normalized Title
Statusactive
Activeyes
Location TextSan Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)
DepartmentEngineering Team
TeamEngineering Team
Employment Typefull_time
Workplace Typeremote
Remote Policyhybrid
CountryUnited States
RegionCA
CitySan Francisco
Salary Raw
Salary Min
Salary Max
Salary Currency
Salary Period
Source URLhttps://jobs.ashbyhq.com/firecrawl/26abaf11-ff85-4f8d-ba44-2b6d32aae2a1
Apply URLhttps://jobs.ashbyhq.com/firecrawl/26abaf11-ff85-4f8d-ba44-2b6d32aae2a1/application
First Seen At2026-05-29 06:56:44Z
Last Seen At2026-06-06 09:40:29Z
Last Checked At2026-06-06 09:40:29Z
Last Changed At2026-05-29 06:56:44Z
Inactive At
Source Posted At
Source Updated At
Raw Payload Uris3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=firecrawl/date=2026-06-06/2026-06-06T09-40-18-188Z-8656973c395e0589aca5672c465607ff05c45847e599d7efa448da0ec970314f.json
Event Fields
{
  "content_hash": "a67cbcb6b6644b43a74305ffe38829081086399073ef802c0b61040143069455",
  "source_hash": "136d7c0fe1e422177def088fb1fd906f5b1605b233a28824f7e17946d2804f7b",
  "last_changed_at": "2026-05-29T06:56:44.724Z",
  "active_status": "active"
}
Parsed Structured
{
  "language": "en",
  "location": {
    "raw": "San Francisco, CA (Hybrid)",
    "city": "San Francisco",
    "region": "CA",
    "country": "United States",
    "is_remote": true,
    "confidence": 0.9
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T09:40:29.025Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco, CA (Hybrid)",
      "city": "San Francisco",
      "region": "CA",
      "country": "United States",
      "is_remote": true,
      "confidence": 0.9
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": "hybrid",
  "salary_period": null,
  "workplace_type": "remote",
  "salary_currency": null
}
Extensions
{}
Native Structured
{
  "id": "26abaf11-ff85-4f8d-ba44-2b6d32aae2a1",
  "team": "Engineering Team",
  "title": "Research Engineer — Reinforcement Learning",
  "jobUrl": "https://jobs.ashbyhq.com/firecrawl/26abaf11-ff85-4f8d-ba44-2b6d32aae2a1",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/firecrawl/26abaf11-ff85-4f8d-ba44-2b6d32aae2a1/application",
  "isListed": true,
  "isRemote": true,
  "location": "San Francisco, CA (Hybrid) OR Remote (Americas, UTC-3 to UTC-10)",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Engineering Team",
  "publishedAt": null,
  "workplaceType": "Remote",
  "employmentType": "FullTime",
  "secondaryLocations": []
}
Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/04a4a75d344a16be40fb050aa5ea7b575a5e4816?include=descriptionJSON
GET https://api.bluedoor.sh/job-postings/v1/orgs/3e9b0772-325b-449d-bfe3-1424e4f1a873JSON
GET https://api.bluedoor.sh/job-postings/v1/sources/faea0405-4731-4b03-bc44-b8947fcec3f4JSON
GET https://api.bluedoor.sh/job-postings/v1/jobs/04a4a75d344a16be40fb050aa5ea7b575a5e4816/eventsJSON