Home › Companies › Ifm Us › Eval360 - Error Analysis Engineer

Eval360 - Error Analysis Engineer

Ifm Us · Sunnyvale, CA · On Site · Active · $150,000–$450,000 / year · Lever

Job facts

Field	Value
Company	Ifm Us
Title	Eval360 - Error Analysis Engineer
Normalized title	-
Department / team	Engineering
Location	Sunnyvale, CA, United States
Work model	On Site
Employment type	Full Time
Salary	$150,000–$450,000 / year
Status	active
ATS provider	Lever
Posted / first seen	2026-06-19 / 2026-06-20
Changed / last seen	2026-06-20 / 2026-06-20

Related slices

Page	What it contains	Open
Company jobs	Active postings from Ifm Us.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Lever.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Sunnyvale.	Open
Work model jobs	Active On Site postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Ifm Us
Source	4d111a77-38db-4b88-84a8-24f761a495a9
ATS provider	Lever

Description

About the Institute of Foundation Models The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mission is to advance AI research, support the next generation of AI builders, and develop impactful systems that improve how frontier models are trained, evaluated, deployed, and governed. As part of our team, you will work closely with researchers, machine learning engineers, data scientists, software engineers, and product teams on some of the most important challenges in AI development. You will contribute to systems that help measure model quality, identify failure modes, and improve the reliability, safety, and readiness of model releases. Visa Sponsorship This position is eligible for visa sponsorship. Benefits Include • Comprehensive medical, dental, and vision benefits • Bonus • 401K plan • Generous paid time off, sick leave, and holidays • Paid parental leave • Employee assistance program • Life insurance and disability insurance The Role We are looking for an Eval360 - Error Analysis Engineer to help build, improve, and operate Eval360, an evaluation service that serves as a quality gate for AI models. This person will focus specifically on error analysis : understanding where models fail, why they fail, how those failures should be categorized, and how evaluation systems can better detect, measure, and prevent these issues before models are released. You will collaborate with researchers, machine learning engineers, product managers, data scientists, and platform teams to develop AI evaluation applications and internal tools based on next-generation AI research. You will be part of a cross-functional team responsible for the full software development lifecycle, from requirements gathering and system design to implementation, deployment, monitoring, debugging, documentation, and continuous improvement. The ideal candidate is comfortable working across the stack, including front-end interfaces for reviewing errors, back-end evaluation pipelines, data analysis workflows, model evaluation infrastructure, databases, dashboards, and APIs. This person should have strong software engineering skills, excellent analytical judgment, and the ability to turn ambiguous model failures into structured insights that improve evaluation quality. Key Responsibilities • Collaborate with researchers, machine learning engineers, data scientists, product managers, and internal stakeholders to implement innovative software solutions for Eval360 and related model evaluation workflows. • Build and improve Eval360 as an evaluation service that acts as a quality gate for model development, model comparison, and model release decisions. • Perform deep error analysis on model outputs, including identifying failure patterns, categorizing issues, tracing root causes, and proposing improvements to evaluation methodology. • Develop tools, workflows, and dashboards that make it easier for researchers and engineers to inspect model failures, compare model behavior, and understand quality regressions. • Design and implement client-side and server-side architecture for evaluation review systems, error analysis interfaces, reporting tools, and internal evaluation applications. • Develop responsive, usable interfaces that support error triage, annotation review, evaluation debugging, and model quality investigation. • Build and maintain back-end services, APIs, data pipelines, and integrations that support evaluation execution, results storage, analysis, and reporting. • Test software to ensure responsiveness, correctness, reliability, and efficiency across evaluation workflows. • Troubleshoot, debug, and upgrade evaluation systems, including identifying issues in data processing, evaluation metrics, model output handling, job orchestration, and user-facing analysis tools. • Create and maintain security, access control, and data protection settings for evaluation data, model outputs, annotations, and internal tooling. • Write clear technical documentation for Eval360 systems, error taxonomies, evaluation workflows, debugging procedures, and user-facing tools. • Work with researchers, data scientists, analysts, and machine learning engineers to improve evaluation quality, model diagnostics, and failure-mode visibility. • Keep track of new development tools, evaluation frameworks, model analysis methods, data quality techniques, and architectures relevant to AI evaluation systems. • Contribute to the design of error taxonomies, evaluation rubrics, quality thresholds, regression detection methods, and model readiness criteria. • Help ensure Eval360 produces reliable, interpretable, and actionable signals for model quality gates. • Contribute to research publications, technical reports, internal knowledge sharing, and external presentations where appropriate. • Contribute to intellectual property and thought leadership in AI evaluation, error analysis, model quality measurement, and evaluation infrastructure. • Perform all other duties as reasonably directed by the line manager that are aligned with these functional objectives. Academic Qualifications • Bachelor's degree in Computer Science, Machine Learning, Data Science, Software Engineering, Statistics, or a related technical field required. • Master's or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, Data Science, or a related field preferred. Professional Experience • Proven experience as a Software Engineer, Full Stack Developer, Machine Learning Evaluation Engineer, Data Scientist, AI Engineer, or similar role. • Experience building software systems for AI, machine learning, data analysis, evaluation, annotation, experimentation, or model monitoring. • Experience working with AI algorithms and the ability to develop systems that accommodate AI-related requirements. • Experience performing error analysis, model evaluation, data quality analysis, or failure-mode investigation for machine learning or language model systems. • Experience developing internal applications, dashboards, review tools, or web-based workflows for technical users. • Familiarity with common software stacks, including front-end frameworks, back-end services, databases, APIs, and cloud or internal infrastructure. • Familiarity with GitHub, Git, CI/CD workflows, and collaborative software development practices. • Knowledge of front-end languages and libraries such as HTML, CSS, JavaScript, TypeScript, React, Angular, or similar technologies. • Knowledge of back-end languages and frameworks such as Python, Java, C#, Node.js, FastAPI, Flask, Django, or similar technologies. • Familiarity with databases such as MySQL, PostgreSQL, MongoDB, or other structured and unstructured data stores. • Familiarity with evaluation frameworks, experiment tracking systems, data pipelines, or machine learning infrastructure is strongly preferred. • Ability to analyze complex model outputs and translate qualitative failures into structured, measurable categories. • Strong problem-solving and troubleshooting skills, especially for ambiguous technical issues involving models, data, metrics, and software systems. • Effective communication and collaboration skills, with the ability to work across research, engineering, data, and product teams. • Strong attention to detail and a high bar for evaluation quality, reliability, and interpretability. Preferred Qualifications • Experience with large language models, foundation models, multimodal models, or model evaluation systems. • Experience designing or using error taxonomies, evaluation rubrics, benchmark datasets, human evaluation workflows, or automated grading systems. • Experience with Python-based data analysis tools such as pandas, NumPy, Jupyter, or similar. • Experience with visualization or dashboarding tools for model quality analysis. • Experience with distributed systems, job queues, workflow orchestration, or large-scale data processing. • Experience working in a research environment or with fast-moving AI product and model teams.

Full job record

Job ID	557498e2bf91fceb913006bbca47dcc4c68d8eb8
Org ID	bb7fb7ce-62b9-4ed3-9327-02a3c7b7e5d0
Source ID	4d111a77-38db-4b88-84a8-24f761a495a9
Board ID	4d111a77-38db-4b88-84a8-24f761a495a9
Provider	lever
Provider Job Key	adf911e6-d3d0-41ed-a7bf-553e1c50e684
Title	Eval360 - Error Analysis Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	Sunnyvale, CA
Department	—
Team	Engineering
Employment Type	Full-time
Workplace Type	on_site
Remote Policy	—
Country	United States
Region	CA
City	Sunnyvale
Salary Raw	USD 150000-450000 per-year-salary
Salary Min	150,000
Salary Max	450,000
Salary Currency	USD
Salary Period	year
Source URL	https://jobs.lever.co/ifm-us/adf911e6-d3d0-41ed-a7bf-553e1c50e684
Apply URL	https://jobs.lever.co/ifm-us/adf911e6-d3d0-41ed-a7bf-553e1c50e684/apply
First Seen At	2026-06-20 07:56:03Z
Last Seen At	2026-06-20 07:56:03Z
Last Checked At	2026-06-20 07:56:03Z
Last Changed At	2026-06-20 07:56:03Z
Inactive At	—
Source Posted At	2026-06-19 21:05:13Z
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=lever/board=ifm-us/date=2026-06-20/2026-06-20T07-56-02-593Z-0ab7124e577c71437276926999ed4d9ae62c5beae65d219654cae179b652c2af.json

Event Fields

{
  "content_hash": "9ec5d12100d2f5420a393588bda3d71b15424dc36ec678776955b2672e819850",
  "source_hash": "e1de2e59d60b1b00cca6a3189ee3584037fafd4a455ce14404d589437c2422ef",
  "last_changed_at": "2026-06-20T07:56:03.212Z",
  "active_status": "active"
}

Parsed Structured

{
  "dedupe": null,
  "language": "en",
  "location": {
    "raw": "Sunnyvale, CA",
    "city": "Sunnyvale",
    "region": "CA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.9
  },
  "salary_max": 450000,
  "salary_min": 150000,
  "inferred_at": "2026-06-20T07:56:03.174Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "Sunnyvale, CA",
      "city": "Sunnyvale",
      "region": "CA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.9
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": "year",
  "workplace_type": "on_site",
  "salary_currency": "USD"
}

Extensions

{}

Native Structured

{
  "lists": [
    {
      "text": "The Role",
      "content": "<div>\n<p style=\"margin: 0in 0in 7pt; line-height: 107%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\">We are looking for an <strong>Eval360 - Error Analysis Engineer</strong> to help build, improve, and operate Eval360, an evaluation service that serves as a quality gate for AI models. This person will focus specifically on <strong>error analysis</strong>: understanding where models fail, why they fail, how those failures should be categorized, and how evaluation systems can better detect, measure, and prevent these issues before models are released.</p>\n<p style=\"margin: 0in 0in 7pt; line-height: 107%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\">You will collaborate with researchers, machine learning engineers, product managers, data scientists, and platform teams to develop AI evaluation applications and internal tools based on next-generation AI research. You will be part of a cross-functional team responsible for the full software development lifecycle, from requirements gathering and system design to implementation, deployment, monitoring, debugging, documentation, and continuous improvement.</p>\n<p style=\"margin: 0in 0in 7pt; line-height: 107%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\">The ideal candidate is comfortable working across the stack, including front-end interfaces for reviewing errors, back-end evaluation pipelines, data analysis workflows, model evaluation infrastructure, databases, dashboards, and APIs. This person should have strong software engineering skills, excellent analytical judgment, and the ability to turn ambiguous model failures into structured insights that improve evaluation quality.</p>\n</div>"
    },
    {
      "text": "Key Responsibilities",
      "content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Collaborate with researchers, machine learning engineers, data scientists, product managers, and internal stakeholders to implement innovative software solutions for Eval360 and related model evaluation workflows.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Build and improve Eval360 as an evaluation service that acts as a quality gate for model development, model comparison, and model release decisions.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Perform deep error analysis on model outputs, including identifying failure patterns, categorizing issues, tracing root causes, and proposing improvements to evaluation methodology.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Develop tools, workflows, and dashboards that make it easier for researchers and engineers to inspect model failures, compare model behavior, and understand quality regressions.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Design and implement client-side and server-side architecture for evaluation review systems, error analysis interfaces, reporting tools, and internal evaluation applications.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Develop responsive, usable interfaces that support error triage, annotation review, evaluation debugging, and model quality investigation.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Build and maintain back-end services, APIs, data pipelines, and integrations that support evaluation execution, results storage, analysis, and reporting.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Test software to ensure responsiveness, correctness, reliability, and efficiency across evaluation workflows.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Troubleshoot, debug, and upgrade evaluation systems, including identifying issues in data processing, evaluation metrics, model output handling, job orchestration, and user-facing analysis tools.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Create and maintain security, access control, and data protection settings for evaluation data, model outputs, annotations, and internal tooling.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Write clear technical documentation for Eval360 systems, error taxonomies, evaluation workflows, debugging procedures, and user-facing tools.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Work with researchers, data scientists, analysts, and machine learning engineers to improve evaluation quality, model diagnostics, and failure-mode visibility.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Keep track of new development tools, evaluation frameworks, model analysis methods, data quality techniques, and architectures relevant to AI evaluation systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Contribute to the design of error taxonomies, evaluation rubrics, quality thresholds, regression detection methods, and model readiness criteria.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Help ensure Eval360 produces reliable, interpretable, and actionable signals for model quality gates.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Contribute to research publications, technical reports, internal knowledge sharing, and external presentations where appropriate.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Contribute to intellectual property and thought leadership in AI evaluation, error analysis, model quality measurement, and evaluation infrastructure.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Perform all other duties as reasonably directed by the line manager that are aligned with these functional objectives.</span></p>\n</div>"
    },
    {
      "text": "Academic Qualifications",
      "content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Bachelor's degree in Computer Science, Machine Learning, Data Science, Software Engineering, Statistics, or a related technical field required.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Master's or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, Data Science, or a related field preferred.</span></p>\n</div>"
    },
    {
      "text": "Professional Experience",
      "content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Proven experience as a Software Engineer, Full Stack Developer, Machine Learning Evaluation Engineer, Data Scientist, AI Engineer, or similar role.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience building software systems for AI, machine learning, data analysis, evaluation, annotation, experimentation, or model monitoring.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience working with AI algorithms and the ability to develop systems that accommodate AI-related requirements.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience performing error analysis, model evaluation, data quality analysis, or failure-mode investigation for machine learning or language model systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience developing internal applications, dashboards, review tools, or web-based workflows for technical users.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with common software stacks, including front-end frameworks, back-end services, databases, APIs, and cloud or internal infrastructure.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with GitHub, Git, CI/CD workflows, and collaborative software development practices.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Knowledge of front-end languages and libraries such as HTML, CSS, JavaScript, TypeScript, React, Angular, or similar technologies.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Knowledge of back-end languages and frameworks such as Python, Java, C#, Node.js, FastAPI, Flask, Django, or similar technologies.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with databases such as MySQL, PostgreSQL, MongoDB, or other structured and unstructured data stores.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with evaluation frameworks, experiment tracking systems, data pipelines, or machine learning infrastructure is strongly preferred.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Ability to analyze complex model outputs and translate qualitative failures into structured, measurable categories.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Strong problem-solving and troubleshooting skills, especially for ambiguous technical issues involving models, data, metrics, and software systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Effective communication and collaboration skills, with the ability to work across research, engineering, data, and product teams.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Strong attention to detail and a high bar for evaluation quality, reliability, and interpretability.</span></p>\n</div>"
    },
    {
      "text": "Preferred Qualifications",
      "content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with large language models, foundation models, multimodal models, or model evaluation systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience designing or using error taxonomies, evaluation rubrics, benchmark datasets, human evaluation workflows, or automated grading systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with Python-based data analysis tools such as pandas, NumPy, Jupyter, or similar.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with visualization or dashboarding tools for model quality analysis.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with distributed systems, job queues, workflow orchestration, or large-scale data processing.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span>&nbsp; <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience working in a research environment or with fast-moving AI product and model teams.</span></p>\n</div>"
    }
  ],
  "country": "US",
  "createdAt": 1781903113703,
  "updatedAt": null,
  "categories": {
    "team": "Engineering",
    "location": "Sunnyvale, CA",
    "commitment": "Full-time",
    "allLocations": [
      "Sunnyvale, CA"
    ]
  },
  "salaryRange": {
    "max": 450000,
    "min": 150000,
    "currency": "USD",
    "interval": "per-year-salary"
  },
  "workplaceType": "onsite"
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/557498e2bf91fceb913006bbca47dcc4c68d8eb8?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/bb7fb7ce-62b9-4ed3-9327-02a3c7b7e5d0JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/4d111a77-38db-4b88-84a8-24f761a495a9JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/557498e2bf91fceb913006bbca47dcc4c68d8eb8/eventsJSON

Docs · Get an API key