Home › Companies › Ifm Us › Eval360 - Error Analysis Engineer
Eval360 - Error Analysis Engineer
Ifm Us · Sunnyvale, CA · On Site · Active · $150,000–$450,000 / year · Lever
Job facts
| Field | Value |
|---|---|
| Company | Ifm Us |
| Title | Eval360 - Error Analysis Engineer |
| Normalized title | - |
| Department / team | Engineering |
| Location | Sunnyvale, CA, United States |
| Work model | On Site |
| Employment type | Full Time |
| Salary | $150,000–$450,000 / year |
| Status | active |
| ATS provider | Lever |
| Posted / first seen | 2026-06-19 / 2026-06-20 |
| Changed / last seen | 2026-06-20 / 2026-06-20 |
Related slices
| Page | What it contains | Open |
|---|---|---|
| Company jobs | Active postings from Ifm Us. | Open |
| Company breakdowns | Role, location, ATS, and work model facets for this company. | Open |
| ATS provider jobs | Active postings observed through Lever. | Open |
| Provider filtered search | The same provider as a filtered job collection. | Open |
| City jobs | Active postings in Sunnyvale. | Open |
| Work model jobs | Active On Site postings. | Open |
| Lifecycle events | Open, update, close, and reopen events for this posting. | Open |
| Original posting | Canonical source or apply URL captured from the ATS. | Open |
Linked records
| Company | Ifm Us |
| Source | 4d111a77-38db-4b88-84a8-24f761a495a9 |
| ATS provider | Lever |
Description
About the Institute of Foundation Models
The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mission is to advance AI research, support the next generation of AI builders, and develop impactful systems that improve how frontier models are trained, evaluated, deployed, and governed.
As part of our team, you will work closely with researchers, machine learning engineers, data scientists, software engineers, and product teams on some of the most important challenges in AI development. You will contribute to systems that help measure model quality, identify failure modes, and improve the reliability, safety, and readiness of model releases.
Visa Sponsorship
This position is eligible for visa sponsorship.
Benefits Include
• Comprehensive medical, dental, and vision benefits
• Bonus
• 401K plan
• Generous paid time off, sick leave, and holidays
• Paid parental leave
• Employee assistance program
• Life insurance and disability insurance
The Role
We are looking for an Eval360 - Error Analysis Engineer to help build, improve, and operate Eval360, an evaluation service that serves as a quality gate for AI models. This person will focus specifically on error analysis : understanding where models fail, why they fail, how those failures should be categorized, and how evaluation systems can better detect, measure, and prevent these issues before models are released.
You will collaborate with researchers, machine learning engineers, product managers, data scientists, and platform teams to develop AI evaluation applications and internal tools based on next-generation AI research. You will be part of a cross-functional team responsible for the full software development lifecycle, from requirements gathering and system design to implementation, deployment, monitoring, debugging, documentation, and continuous improvement.
The ideal candidate is comfortable working across the stack, including front-end interfaces for reviewing errors, back-end evaluation pipelines, data analysis workflows, model evaluation infrastructure, databases, dashboards, and APIs. This person should have strong software engineering skills, excellent analytical judgment, and the ability to turn ambiguous model failures into structured insights that improve evaluation quality.
Key Responsibilities
• Collaborate with researchers, machine learning engineers, data scientists, product managers, and internal stakeholders to implement innovative software solutions for Eval360 and related model evaluation workflows.
• Build and improve Eval360 as an evaluation service that acts as a quality gate for model development, model comparison, and model release decisions.
• Perform deep error analysis on model outputs, including identifying failure patterns, categorizing issues, tracing root causes, and proposing improvements to evaluation methodology.
• Develop tools, workflows, and dashboards that make it easier for researchers and engineers to inspect model failures, compare model behavior, and understand quality regressions.
• Design and implement client-side and server-side architecture for evaluation review systems, error analysis interfaces, reporting tools, and internal evaluation applications.
• Develop responsive, usable interfaces that support error triage, annotation review, evaluation debugging, and model quality investigation.
• Build and maintain back-end services, APIs, data pipelines, and integrations that support evaluation execution, results storage, analysis, and reporting.
• Test software to ensure responsiveness, correctness, reliability, and efficiency across evaluation workflows.
• Troubleshoot, debug, and upgrade evaluation systems, including identifying issues in data processing, evaluation metrics, model output handling, job orchestration, and user-facing analysis tools.
• Create and maintain security, access control, and data protection settings for evaluation data, model outputs, annotations, and internal tooling.
• Write clear technical documentation for Eval360 systems, error taxonomies, evaluation workflows, debugging procedures, and user-facing tools.
• Work with researchers, data scientists, analysts, and machine learning engineers to improve evaluation quality, model diagnostics, and failure-mode visibility.
• Keep track of new development tools, evaluation frameworks, model analysis methods, data quality techniques, and architectures relevant to AI evaluation systems.
• Contribute to the design of error taxonomies, evaluation rubrics, quality thresholds, regression detection methods, and model readiness criteria.
• Help ensure Eval360 produces reliable, interpretable, and actionable signals for model quality gates.
• Contribute to research publications, technical reports, internal knowledge sharing, and external presentations where appropriate.
• Contribute to intellectual property and thought leadership in AI evaluation, error analysis, model quality measurement, and evaluation infrastructure.
• Perform all other duties as reasonably directed by the line manager that are aligned with these functional objectives.
Academic Qualifications
• Bachelor's degree in Computer Science, Machine Learning, Data Science, Software Engineering, Statistics, or a related technical field required.
• Master's or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, Data Science, or a related field preferred.
Professional Experience
• Proven experience as a Software Engineer, Full Stack Developer, Machine Learning Evaluation Engineer, Data Scientist, AI Engineer, or similar role.
• Experience building software systems for AI, machine learning, data analysis, evaluation, annotation, experimentation, or model monitoring.
• Experience working with AI algorithms and the ability to develop systems that accommodate AI-related requirements.
• Experience performing error analysis, model evaluation, data quality analysis, or failure-mode investigation for machine learning or language model systems.
• Experience developing internal applications, dashboards, review tools, or web-based workflows for technical users.
• Familiarity with common software stacks, including front-end frameworks, back-end services, databases, APIs, and cloud or internal infrastructure.
• Familiarity with GitHub, Git, CI/CD workflows, and collaborative software development practices.
• Knowledge of front-end languages and libraries such as HTML, CSS, JavaScript, TypeScript, React, Angular, or similar technologies.
• Knowledge of back-end languages and frameworks such as Python, Java, C#, Node.js, FastAPI, Flask, Django, or similar technologies.
• Familiarity with databases such as MySQL, PostgreSQL, MongoDB, or other structured and unstructured data stores.
• Familiarity with evaluation frameworks, experiment tracking systems, data pipelines, or machine learning infrastructure is strongly preferred.
• Ability to analyze complex model outputs and translate qualitative failures into structured, measurable categories.
• Strong problem-solving and troubleshooting skills, especially for ambiguous technical issues involving models, data, metrics, and software systems.
• Effective communication and collaboration skills, with the ability to work across research, engineering, data, and product teams.
• Strong attention to detail and a high bar for evaluation quality, reliability, and interpretability.
Preferred Qualifications
• Experience with large language models, foundation models, multimodal models, or model evaluation systems.
• Experience designing or using error taxonomies, evaluation rubrics, benchmark datasets, human evaluation workflows, or automated grading systems.
• Experience with Python-based data analysis tools such as pandas, NumPy, Jupyter, or similar.
• Experience with visualization or dashboarding tools for model quality analysis.
• Experience with distributed systems, job queues, workflow orchestration, or large-scale data processing.
• Experience working in a research environment or with fast-moving AI product and model teams.
Full job record
| Job ID | 557498e2bf91fceb913006bbca47dcc4c68d8eb8 |
| Org ID | bb7fb7ce-62b9-4ed3-9327-02a3c7b7e5d0 |
| Source ID | 4d111a77-38db-4b88-84a8-24f761a495a9 |
| Board ID | 4d111a77-38db-4b88-84a8-24f761a495a9 |
| Provider | lever |
| Provider Job Key | adf911e6-d3d0-41ed-a7bf-553e1c50e684 |
| Title | Eval360 - Error Analysis Engineer |
| Normalized Title | — |
| Status | active |
| Active | yes |
| Location Text | Sunnyvale, CA |
| Department | — |
| Team | Engineering |
| Employment Type | Full-time |
| Workplace Type | on_site |
| Remote Policy | — |
| Country | United States |
| Region | CA |
| City | Sunnyvale |
| Salary Raw | USD 150000-450000 per-year-salary |
| Salary Min | 150,000 |
| Salary Max | 450,000 |
| Salary Currency | USD |
| Salary Period | year |
| Source URL | https://jobs.lever.co/ifm-us/adf911e6-d3d0-41ed-a7bf-553e1c50e684 |
| Apply URL | https://jobs.lever.co/ifm-us/adf911e6-d3d0-41ed-a7bf-553e1c50e684/apply |
| First Seen At | 2026-06-20 07:56:03Z |
| Last Seen At | 2026-06-20 07:56:03Z |
| Last Checked At | 2026-06-20 07:56:03Z |
| Last Changed At | 2026-06-20 07:56:03Z |
| Inactive At | — |
| Source Posted At | 2026-06-19 21:05:13Z |
| Source Updated At | — |
| Raw Payload Uri | s3://job-postings-prod-raw-590183727216/raw/provider=lever/board=ifm-us/date=2026-06-20/2026-06-20T07-56-02-593Z-0ab7124e577c71437276926999ed4d9ae62c5beae65d219654cae179b652c2af.json |
Event Fields
{
"content_hash": "9ec5d12100d2f5420a393588bda3d71b15424dc36ec678776955b2672e819850",
"source_hash": "e1de2e59d60b1b00cca6a3189ee3584037fafd4a455ce14404d589437c2422ef",
"last_changed_at": "2026-06-20T07:56:03.212Z",
"active_status": "active"
}Parsed Structured
{
"dedupe": null,
"language": "en",
"location": {
"raw": "Sunnyvale, CA",
"city": "Sunnyvale",
"region": "CA",
"country": "United States",
"is_remote": false,
"confidence": 0.9
},
"salary_max": 450000,
"salary_min": 150000,
"inferred_at": "2026-06-20T07:56:03.174Z",
"launch_scope": {
"reason": "english_us_canada",
"included": true,
"language": "en",
"location": {
"raw": "Sunnyvale, CA",
"city": "Sunnyvale",
"region": "CA",
"country": "United States",
"is_remote": false,
"confidence": 0.9
},
"countries": [
"United States"
]
},
"remote_policy": null,
"salary_period": "year",
"workplace_type": "on_site",
"salary_currency": "USD"
}Extensions
{}Native Structured
{
"lists": [
{
"text": "The Role",
"content": "<div>\n<p style=\"margin: 0in 0in 7pt; line-height: 107%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\">We are looking for an <strong>Eval360 - Error Analysis Engineer</strong> to help build, improve, and operate Eval360, an evaluation service that serves as a quality gate for AI models. This person will focus specifically on <strong>error analysis</strong>: understanding where models fail, why they fail, how those failures should be categorized, and how evaluation systems can better detect, measure, and prevent these issues before models are released.</p>\n<p style=\"margin: 0in 0in 7pt; line-height: 107%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\">You will collaborate with researchers, machine learning engineers, product managers, data scientists, and platform teams to develop AI evaluation applications and internal tools based on next-generation AI research. You will be part of a cross-functional team responsible for the full software development lifecycle, from requirements gathering and system design to implementation, deployment, monitoring, debugging, documentation, and continuous improvement.</p>\n<p style=\"margin: 0in 0in 7pt; line-height: 107%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\">The ideal candidate is comfortable working across the stack, including front-end interfaces for reviewing errors, back-end evaluation pipelines, data analysis workflows, model evaluation infrastructure, databases, dashboards, and APIs. This person should have strong software engineering skills, excellent analytical judgment, and the ability to turn ambiguous model failures into structured insights that improve evaluation quality.</p>\n</div>"
},
{
"text": "Key Responsibilities",
"content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Collaborate with researchers, machine learning engineers, data scientists, product managers, and internal stakeholders to implement innovative software solutions for Eval360 and related model evaluation workflows.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Build and improve Eval360 as an evaluation service that acts as a quality gate for model development, model comparison, and model release decisions.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Perform deep error analysis on model outputs, including identifying failure patterns, categorizing issues, tracing root causes, and proposing improvements to evaluation methodology.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Develop tools, workflows, and dashboards that make it easier for researchers and engineers to inspect model failures, compare model behavior, and understand quality regressions.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Design and implement client-side and server-side architecture for evaluation review systems, error analysis interfaces, reporting tools, and internal evaluation applications.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Develop responsive, usable interfaces that support error triage, annotation review, evaluation debugging, and model quality investigation.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Build and maintain back-end services, APIs, data pipelines, and integrations that support evaluation execution, results storage, analysis, and reporting.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Test software to ensure responsiveness, correctness, reliability, and efficiency across evaluation workflows.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Troubleshoot, debug, and upgrade evaluation systems, including identifying issues in data processing, evaluation metrics, model output handling, job orchestration, and user-facing analysis tools.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Create and maintain security, access control, and data protection settings for evaluation data, model outputs, annotations, and internal tooling.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Write clear technical documentation for Eval360 systems, error taxonomies, evaluation workflows, debugging procedures, and user-facing tools.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Work with researchers, data scientists, analysts, and machine learning engineers to improve evaluation quality, model diagnostics, and failure-mode visibility.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Keep track of new development tools, evaluation frameworks, model analysis methods, data quality techniques, and architectures relevant to AI evaluation systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Contribute to the design of error taxonomies, evaluation rubrics, quality thresholds, regression detection methods, and model readiness criteria.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Help ensure Eval360 produces reliable, interpretable, and actionable signals for model quality gates.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Contribute to research publications, technical reports, internal knowledge sharing, and external presentations where appropriate.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Contribute to intellectual property and thought leadership in AI evaluation, error analysis, model quality measurement, and evaluation infrastructure.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Perform all other duties as reasonably directed by the line manager that are aligned with these functional objectives.</span></p>\n</div>"
},
{
"text": "Academic Qualifications",
"content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Bachelor's degree in Computer Science, Machine Learning, Data Science, Software Engineering, Statistics, or a related technical field required.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Master's or Ph.D. in Computer Science, Machine Learning, Artificial Intelligence, Data Science, or a related field preferred.</span></p>\n</div>"
},
{
"text": "Professional Experience",
"content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Proven experience as a Software Engineer, Full Stack Developer, Machine Learning Evaluation Engineer, Data Scientist, AI Engineer, or similar role.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience building software systems for AI, machine learning, data analysis, evaluation, annotation, experimentation, or model monitoring.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience working with AI algorithms and the ability to develop systems that accommodate AI-related requirements.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience performing error analysis, model evaluation, data quality analysis, or failure-mode investigation for machine learning or language model systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience developing internal applications, dashboards, review tools, or web-based workflows for technical users.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with common software stacks, including front-end frameworks, back-end services, databases, APIs, and cloud or internal infrastructure.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with GitHub, Git, CI/CD workflows, and collaborative software development practices.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Knowledge of front-end languages and libraries such as HTML, CSS, JavaScript, TypeScript, React, Angular, or similar technologies.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Knowledge of back-end languages and frameworks such as Python, Java, C#, Node.js, FastAPI, Flask, Django, or similar technologies.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with databases such as MySQL, PostgreSQL, MongoDB, or other structured and unstructured data stores.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Familiarity with evaluation frameworks, experiment tracking systems, data pipelines, or machine learning infrastructure is strongly preferred.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Ability to analyze complex model outputs and translate qualitative failures into structured, measurable categories.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Strong problem-solving and troubleshooting skills, especially for ambiguous technical issues involving models, data, metrics, and software systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Effective communication and collaboration skills, with the ability to work across research, engineering, data, and product teams.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Strong attention to detail and a high bar for evaluation quality, reliability, and interpretability.</span></p>\n</div>"
},
{
"text": "Preferred Qualifications",
"content": "<div>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with large language models, foundation models, multimodal models, or model evaluation systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience designing or using error taxonomies, evaluation rubrics, benchmark datasets, human evaluation workflows, or automated grading systems.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with Python-based data analysis tools such as pandas, NumPy, Jupyter, or similar.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with visualization or dashboarding tools for model quality analysis.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience with distributed systems, job queues, workflow orchestration, or large-scale data processing.</span></p>\n<p style=\"margin: 0in 0in 3.2pt 20.15pt; text-indent: -12.95pt; line-height: 104%; font-size: 10.5pt; font-family: Arial, sans-serif; color: #222222;\"><span style=\"color: #1f4e79;\">•</span> <span style=\"font-size: 10.0pt; line-height: 104%;\">Experience working in a research environment or with fast-moving AI product and model teams.</span></p>\n</div>"
}
],
"country": "US",
"createdAt": 1781903113703,
"updatedAt": null,
"categories": {
"team": "Engineering",
"location": "Sunnyvale, CA",
"commitment": "Full-time",
"allLocations": [
"Sunnyvale, CA"
]
},
"salaryRange": {
"max": 450000,
"min": 150000,
"currency": "USD",
"interval": "per-year-salary"
},
"workplaceType": "onsite"
}Get this page with API
Rendered from the bluedoor Job Postings API. Reproduce it:
GET https://api.bluedoor.sh/job-postings/v1/jobs/557498e2bf91fceb913006bbca47dcc4c68d8eb8?include=descriptionJSONGET https://api.bluedoor.sh/job-postings/v1/orgs/bb7fb7ce-62b9-4ed3-9327-02a3c7b7e5d0JSONGET https://api.bluedoor.sh/job-postings/v1/sources/4d111a77-38db-4b88-84a8-24f761a495a9JSONGET https://api.bluedoor.sh/job-postings/v1/jobs/557498e2bf91fceb913006bbca47dcc4c68d8eb8/eventsJSON