Home › Companies › Sema4.Ai › Staff Engineer, AI Evals

Staff Engineer, AI Evals

Sema4.Ai · Atlanta, GA or Madison, WI · Active · Ashby

Job facts

Field	Value
Company	Sema4.Ai
Title	Staff Engineer, AI Evals
Normalized title	-
Department / team	Engineering / Engineering
Location	Atlanta, GA, United States
Work model	-
Employment type	Full Time
Salary	-
Status	active
ATS provider	Ashby
Posted / first seen	— / 2026-05-29
Changed / last seen	2026-05-29 / 2026-06-22

Related slices

Page	What it contains	Open
Company jobs	Active postings from Sema4.Ai.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Ashby.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Atlanta.	Open
Department jobs	Active postings in Engineering.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Sema4.Ai
Source	1b18b011-96b5-472c-9ac5-c127a8b3bda1
ATS provider	Ashby

Description

The Opportunity At Sema4.ai , we’re building an Enterprise AI Agent platform that fundamentally changes how knowledge work gets done—by enabling people and AI agents to collaborate in durable, trustworthy ways. As a Staff Engineer, AI Evals , you’ll design and own the evaluation systems that determine whether our agents are actually good: correct, reliable, efficient, and improving over time. You’ll build the measurement backbone that guides model choice, agent design, product decisions, and customer trust. This is an early, high-impact role. You’ll be defining how we measure success for AI agents in production, where ambiguity is real, and ground truth can be messy. We’re looking for an engineer who brings rigor, judgment, and strong opinions about what “good” looks like, and who know how to operationalize it. Who You Are AI Systems & Evaluation Expert You understand that AI systems are only as good as the way they’re measured. You’ve worked with LLMs and agentic systems in production and have seen how offline benchmarks, synthetic data, and human judgment can all fail in different ways. You know how to design evaluations that are meaningful, repeatable, and decision-useful, not just theoretically impressive. You’re familiar with the sharp edges: non-determinism, prompt drift, regression risk, overfitting, data leakage, and the tension between fast iteration and statistical rigor. In-Depth Technologist You stay close to research and industry practice in evaluation, alignment, and reliability. You understand where automated metrics work, where they break down, and how to combine them with human review, golden datasets, and production signals. You bring creativity to building evaluation sets and scenarios, and in sourcing (or synthesizing) the data you need. Builder With High Standards You care deeply about correctness, clarity, and operational behavior. You can move fast, but you don’t confuse speed and rigor. You design eval systems that engineers trust, product relies on, and leadership uses to make decisions. You know when to build custom infrastructure and when to leverage existing tools without outsourcing critical thinking. What You’ll Do Build and Own the Evaluation Platform Design, build, and operate Sema4.ai ’s core evaluation infrastructure for LLMs and agents: offline benchmarks, regression tests, task-level metrics, and production feedback loops. These systems will directly inform product launches, model upgrades, and customer requirements. Define “Good” for Agents in Production Work closely with agent, product, and field engineering teams to translate fuzzy goals around correctness, reliability, usefulness into concrete, measurable signals. You’ll help define success criteria for new capabilities and ensure we can detect regressions before customers do. Tackle Ambiguous, High-Leverage Problems Solve hard problems where the answer isn’t obvious: How to evaluate long-running, multi-step agents How to balance automated scoring with human judgment How to measure improvement when tasks evolve How to compare models under cost and latency constraints Influence Technical and Product Direction Use evaluation results to guide architectural decisions, model selection, and roadmap tradeoffs. You’ll participate in design reviews, set technical standards for eval rigor, mentor other engineers, and help interview senior technical candidates. What You Bring 7+ years of software engineering experience, including 2+ years building AI/ML systems in production Deep experience with backend systems in Python, including data pipelines, observability, and reliability Hands-on experience evaluating LLM-based systems (agents, retrieval, tool use, workflows, etc.) Strong intuition for metrics, experimentation, and failure analysis in non-deterministic systems Strong communication skills: whether you’re talking to colleagues, customers, or machines, you communicate clearly, concisely, and collaboratively A high-ownership mindset: you care deeply about the integrity of the systems you build and the decisions they inform

Full job record

Job ID	ec37bdc9d735ef9eb564ba727ad861e4308288fe
Org ID	9da1b1be-b5dc-4f7d-a430-bb5b80da8262
Source ID	1b18b011-96b5-472c-9ac5-c127a8b3bda1
Board ID	1b18b011-96b5-472c-9ac5-c127a8b3bda1
Provider	ashby
Provider Job Key	592ee407-6e70-4d0f-88bd-3638a4ba7240
Title	Staff Engineer, AI Evals
Normalized Title	—
Status	active
Active	yes
Location Text	Atlanta, GA or Madison, WI
Department	Engineering
Team	Engineering
Employment Type	full_time
Workplace Type	—
Remote Policy	—
Country	United States
Region	GA
City	Atlanta
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://jobs.ashbyhq.com/sema4.ai/592ee407-6e70-4d0f-88bd-3638a4ba7240
Apply URL	https://jobs.ashbyhq.com/sema4.ai/592ee407-6e70-4d0f-88bd-3638a4ba7240/application
First Seen At	2026-05-29 05:38:34Z
Last Seen At	2026-06-22 09:07:16Z
Last Checked At	2026-06-22 09:07:16Z
Last Changed At	2026-05-29 05:38:34Z
Inactive At	—
Source Posted At	—
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=sema4.ai/date=2026-06-22/2026-06-22T09-07-15-943Z-d1f4eaaa46fc6e7cf5878f411cbcccb00349432e72b18e8c1928bb7822c8ba4d.json

Event Fields

{
  "content_hash": "8f04008645e6c4d10ff821656e523e14dbd9f5aebbfacfbf410a68b76cb52625",
  "source_hash": "f129ca38a851d2b6c1b2e5c7cd86477a553fe445a2d3a7089f5191457a8fd517",
  "last_changed_at": "2026-05-29T05:38:34.203Z",
  "active_status": "active"
}

Parsed Structured

{
  "dedupe": null,
  "language": "en",
  "location": {
    "raw": "Atlanta, GA",
    "city": "Atlanta",
    "region": "GA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.9
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-22T09:07:16.709Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "Atlanta, GA",
      "city": "Atlanta",
      "region": "GA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.9
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "id": "592ee407-6e70-4d0f-88bd-3638a4ba7240",
  "team": "Engineering",
  "title": "Staff Engineer, AI Evals",
  "jobUrl": "https://jobs.ashbyhq.com/sema4.ai/592ee407-6e70-4d0f-88bd-3638a4ba7240",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/sema4.ai/592ee407-6e70-4d0f-88bd-3638a4ba7240/application",
  "isListed": true,
  "isRemote": false,
  "location": "Atlanta, GA or Madison, WI",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Engineering",
  "publishedAt": null,
  "workplaceType": null,
  "employmentType": "FullTime",
  "secondaryLocations": []
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/ec37bdc9d735ef9eb564ba727ad861e4308288fe?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/9da1b1be-b5dc-4f7d-a430-bb5b80da8262JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/1b18b011-96b5-472c-9ac5-c127a8b3bda1JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/ec37bdc9d735ef9eb564ba727ad861e4308288fe/eventsJSON

Docs · Get an API key