Home › Companies › Mpathic2 › Red Teaming Expert

Red Teaming Expert

Mpathic2 · Active · $30 / hour · BambooHR

Job facts

Field	Value
Company	Mpathic2
Title	Red Teaming Expert
Normalized title	-
Department / team	Experts
Location	Seattle, WA, United States
Work model	-
Employment type	Contract
Salary	$30 / hour
Status	active
ATS provider	BambooHR
Posted / first seen	2026-04-29 / 2026-05-30
Changed / last seen	2026-05-30 / 2026-07-16

Related slices

Page	What it contains	Open
Company jobs	Active postings from Mpathic2.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through BambooHR.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Seattle.	Open
Department jobs	Active postings in Experts.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Mpathic2
Source	b1af6ab1-26b4-4778-a1f4-d8ae41a6f240
ATS provider	BambooHR

Description

About mpathic.ai Keeping the human in AI. mpathic is a trusted leader in advancing quality and safety in AI systems through expert-led evaluation and human data. We partner with leading technology companies to support red teaming, trust & safety, expert annotation, and model evaluation across high-stakes domains. About the Role mpathic is seeking part-time, project-based Red Teaming Experts to support a red-teaming and evaluation campaign focused on AI safety and model behavior in sensitive, real-world interactions. In this role, you will design, simulate, and evaluate conversations with AI systems to assess safety, risk, and behavioral performance. You will identify failure modes, edge cases, and policy gaps—particularly in scenarios involving distress, ambiguity, or escalation. This role involves roleplaying and reviewing clinical scenarios with AI agents. As such, we are ideally seeking candidates who bring creative or performance-driven strengths , as these competencies enhance the realism, nuance, and emotional depth needed for AI safety testing. Examples of these can include, but are not limited to: Theatre degrees or studies Acting, theatre, improv, or voice-over experience Strong writing skills, especially dialogue or scenario writing Experience creating or inhabiting characters (e.g., performers, TTRPG roleplay, narrative designers) Conversational design, interaction writing, or scripted roleplay experience Participation in gaming, interactive storytelling, or digital communities where roleplay is common What You’ll Be Working On You will help identify, prevent, and characterize risks that emerge when users interact with AI systems. Responsibilities may include: Designing and executing red-teaming scenarios across diverse user behaviors Reviewing AI-generated responses for safety, accuracy, and policy compliance Identifying failure modes, edge cases, and behavioral risks Assessing whether AI appropriately recognizes and responds to distress or escalation Evaluating tone, boundaries, and appropriateness in sensitive interactions Detecting misleading, overconfident, or unsafe responses Evaluating multi-turn conversations for consistency and risk handling Identifying gaps in responses, including missed signals or incomplete handling Conducting qualitative analysis to identify behavioral patterns and system weaknesses Documenting edge cases, failure patterns, and safety risks Applying or contributing to evaluation rubrics, taxonomies, and frameworks Supporting quality assurance (QA) to ensure consistency across evaluations Collaborating with internal teams on AI safety and evaluation improvements Participating in red teaming exercises to surface system vulnerabilities Maintaining strict confidentiality and quality standards What We’re Looking For Successful candidates are detail-oriented, analytically strong, and experienced in evaluating or stress-testing AI systems in complex or high-risk scenarios. Professional experience in one or more of the following: LLM red teaming or AI safety evaluation Trust & safety, content moderation, or policy enforcement AI/ML evaluation, annotation, or QA workflows Conversational analysis or behavioral risk assessment Work involving sensitive or high-stakes user interactions Strong understanding of: AI safety principles and common failure modes Behavioral risk, escalation patterns, and edge-case handling Mental health sensitivity, boundaries, and responsible AI behavior How users express distress, confusion, or harmful intent in conversation Ability to identify: Safety violations and policy gaps Missed or mishandled risk signals Unsafe, misleading, or overconfident responses Inappropriate tone or boundary-setting Failures in escalation, de-escalation, or resolution Inconsistencies across multi-turn interactions Experience with or Interest in: Red teaming methodologies and adversarial testing Evaluating conversational AI systems or chatbots Developing or applying evaluation frameworks and rubrics Understanding how AI systems perform under real user behavior Comfort with: Tech tools and platforms (Slack, spreadsheets, dashboards) Evaluating AI-generated responses (no coding required, but must be tech-comfortable) Ambiguity, iteration, and feedback-driven workflows Willingness to: Sign NDAs and work with sensitive or high-impact content Nice to Have (Not Required) Background in mental health, behavioral science, or psychology Experience in QA, annotation, or qualitative analysis Experience with AI systems in sensitive domains (e.g., healthcare, safety) Familiarity with evaluation metrics or safety frameworks Compensation $30-60/hour, depending on experience and specific project tasks/difficulty

Full job record

Job ID	2e73108d54307d25a65ea85208ee08ea7e7850b9
Org ID	a49c47a0-5ae6-4084-89b9-187ae791ed8b
Source ID	b1af6ab1-26b4-4778-a1f4-d8ae41a6f240
Board ID	b1af6ab1-26b4-4778-a1f4-d8ae41a6f240
Provider	bamboohr
Provider Job Key	74
Title	Red Teaming Expert
Normalized Title	—
Status	active
Active	yes
Location Text	—
Department	Experts
Team	—
Employment Type	contract
Workplace Type	—
Remote Policy	—
Country	United States
Region	WA
City	Seattle
Salary Raw	Compensation $30-60/hour, depending on experience and specific project tasks/difficulty
Salary Min	30
Salary Max	—
Salary Currency	USD
Salary Period	hour
Source URL	https://mpathic2.bamboohr.com/careers/74
Apply URL	https://mpathic2.bamboohr.com/careers/74
First Seen At	2026-05-30 06:04:22Z
Last Seen At	2026-07-16 08:53:41Z
Last Checked At	2026-07-16 08:53:41Z
Last Changed At	2026-05-30 06:04:22Z
Inactive At	—
Source Posted At	2026-04-29 00:00:00Z
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=bamboohr/board=mpathic2/date=2026-07-16/2026-07-16T08-53-39-719Z-f60f6fd6f51e843645a715be0a107058db281c4d4bb89b2708751910b0e14413.json

Event Fields

{
  "content_hash": "0b84b4e91bd95783f352f29e3dbfd60713b0ac198cbd51af77990acc5497635b",
  "source_hash": "4bd6e15f43b088dee20262dc65c8e85b206a1c1b8c1f09a48969cdd05978bbec",
  "last_changed_at": "2026-05-30T06:04:22.595Z",
  "active_status": "active"
}

Parsed Structured

{
  "dedupe": null,
  "language": "en",
  "location": {
    "raw": "Seattle, Washington, United States",
    "city": "Seattle",
    "region": "WA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.8
  },
  "salary_max": null,
  "salary_min": 30,
  "inferred_at": "2026-07-16T08:53:41.350Z",
  "launch_scope": {
    "reason": "bamboohr_production_catalog",
    "included": true,
    "location": {
      "raw": "Seattle, Washington, United States",
      "city": "Seattle",
      "region": "WA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.8
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": "hour",
  "workplace_type": null,
  "salary_currency": "USD"
}

Extensions

{}

Native Structured

{
  "list_job": {
    "id": "74",
    "isRemote": null,
    "location": {
      "city": null,
      "state": null
    },
    "atsLocation": {
      "city": "Seattle",
      "state": "Washington",
      "country": "United States",
      "province": null
    },
    "departmentId": "18634",
    "locationType": "1",
    "jobOpeningName": "Red Teaming Expert",
    "departmentLabel": "Experts",
    "employmentStatusLabel": "Contractor"
  },
  "detail_errors": [],
  "detail_job_opening": {
    "location": {
      "city": null,
      "state": null,
      "postalCode": null,
      "addressCountry": null
    },
    "datePosted": "2026-04-29",
    "atsLocation": {
      "city": "Seattle",
      "state": "Washington",
      "country": "United States",
      "countryId": "1"
    },
    "description": "<ul></ul>\n<p><span style=\"font-family: Inter, sans-serif; font-size: 12pt; font-weight: bold\">About mpathic.ai</span></p>\n<p><span style=\"font-size: 10pt\">Keeping the human in AI. mpathic is a trusted leader in advancing quality and safety in AI systems through expert-led evaluation and human data. We partner with leading technology companies to support red teaming, trust &amp; safety, expert annotation, and model evaluation across high-stakes domains.</span></p>\n<p><br></p>\n<p><span style=\"font-family: Inter, sans-serif; font-size: 12pt; font-weight: bold\">About the Role</span></p>\n<p><span style=\"font-size: 10pt\">mpathic is seeking <span style=\"font-weight: bold\">part-time, project-based Red Teaming Experts</span> to support a red-teaming and evaluation campaign focused on AI safety and model behavior in sensitive, real-world interactions.</span></p>\n<p><br><br></p>\n<p><span style=\"font-size: 10pt\">In this role, you will design, simulate, and evaluate conversations with AI systems to assess safety, risk, and behavioral performance. You will identify failure modes, edge cases, and policy gaps—particularly in scenarios involving distress, ambiguity, or escalation.</span><br></p>\n<p><br><br></p>\n<p><span style=\"font-family: Arial, sans-serif; font-size: 10pt\"><span style=\"font-weight: bold\">This role involves roleplaying and reviewing clinical scenarios with AI agents. </span>As such, we are ideally seeking candidates who bring <span style=\"font-weight: bold\">creative or performance-driven strengths</span>, as these competencies enhance the realism, nuance, and emotional depth needed for AI safety testing. Examples of these can include, but are not limited to: </span></p>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Theatre degrees or studies</span></li>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Acting, theatre, improv, or voice-over experience </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Strong writing skills, especially dialogue or scenario writing </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Experience creating or inhabiting characters (e.g., performers, TTRPG roleplay, narrative designers) </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Conversational design, interaction writing, or scripted roleplay experience </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Participation in gaming, interactive storytelling, or digital communities where roleplay is common </span></li>\n</ul>\n<p><br></p>\n<p><span style=\"font-family: Inter, sans-serif; font-size: 10pt\"><span style=\"font-family: Inter, sans-serif; font-size: 12pt; font-weight: bold\">What You’ll Be Working On </span></span></p>\n<p><span style=\"font-size: 10pt\">You will help identify, prevent, and characterize risks that emerge when users interact with AI systems.</span></p>\n<p><br><br></p>\n<p><span style=\"font-size: 10pt\">Responsibilities may include:</span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Designing and executing red-teaming scenarios across diverse user behaviors</span></li>\n<li><span style=\"font-size: 10pt\">Reviewing AI-generated responses for safety, accuracy, and policy compliance</span></li>\n<li><span style=\"font-size: 10pt\">Identifying failure modes, edge cases, and behavioral risks</span></li>\n<li><span style=\"font-size: 10pt\">Assessing whether AI appropriately recognizes and responds to distress or escalation</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating tone, boundaries, and appropriateness in sensitive interactions</span></li>\n<li><span style=\"font-size: 10pt\">Detecting misleading, overconfident, or unsafe responses</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating multi-turn conversations for consistency and risk handling</span></li>\n<li><span style=\"font-size: 10pt\">Identifying gaps in responses, including missed signals or incomplete handling</span></li>\n<li><span style=\"font-size: 10pt\">Conducting qualitative analysis to identify behavioral patterns and system weaknesses</span></li>\n<li><span style=\"font-size: 10pt\">Documenting edge cases, failure patterns, and safety risks</span></li>\n<li><span style=\"font-size: 10pt\">Applying or contributing to evaluation rubrics, taxonomies, and frameworks</span></li>\n<li><span style=\"font-size: 10pt\">Supporting quality assurance (QA) to ensure consistency across evaluations</span></li>\n<li><span style=\"font-size: 10pt\">Collaborating with internal teams on AI safety and evaluation improvements</span></li>\n<li><span style=\"font-size: 10pt\">Participating in red teaming exercises to surface system vulnerabilities</span></li>\n<li><span style=\"font-size: 10pt\">Maintaining strict confidentiality and quality standards</span></li>\n</ul>\n<p><br></p>\n<p><span style=\"font-weight: bold\">What We’re Looking For</span></p>\n<p><span style=\"font-size: 10pt\">Successful candidates are detail-oriented, analytically strong, and experienced in evaluating or stress-testing AI systems in complex or high-risk scenarios.</span></p>\n<p><br><br></p>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Professional experience in one or more of the following:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">LLM red teaming or AI safety evaluation</span></li>\n<li><span style=\"font-size: 10pt\">Trust &amp; safety, content moderation, or policy enforcement</span></li>\n<li><span style=\"font-size: 10pt\">AI/ML evaluation, annotation, or QA workflows</span></li>\n<li><span style=\"font-size: 10pt\">Conversational analysis or behavioral risk assessment</span></li>\n<li><span style=\"font-size: 10pt\">Work involving sensitive or high-stakes user interactions</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Strong understanding of:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">AI safety principles and common failure modes</span></li>\n<li><span style=\"font-size: 10pt\">Behavioral risk, escalation patterns, and edge-case handling</span></li>\n<li><span style=\"font-size: 10pt\">Mental health sensitivity, boundaries, and responsible AI behavior</span></li>\n<li><span style=\"font-size: 10pt\">How users express distress, confusion, or harmful intent in conversation</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Ability to identify:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Safety violations and policy gaps</span></li>\n<li><span style=\"font-size: 10pt\">Missed or mishandled risk signals</span></li>\n<li><span style=\"font-size: 10pt\">Unsafe, misleading, or overconfident responses</span></li>\n<li><span style=\"font-size: 10pt\">Inappropriate tone or boundary-setting</span></li>\n<li><span style=\"font-size: 10pt\">Failures in escalation, de-escalation, or resolution</span></li>\n<li><span style=\"font-size: 10pt\">Inconsistencies across multi-turn interactions</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Experience with or Interest in:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Red teaming methodologies and adversarial testing</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating conversational AI systems or chatbots</span></li>\n<li><span style=\"font-size: 10pt\">Developing or applying evaluation frameworks and rubrics</span></li>\n<li><span style=\"font-size: 10pt\">Understanding how AI systems perform under real user behavior</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Comfort with:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Tech tools and platforms (Slack, spreadsheets, dashboards)</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating AI-generated responses (no coding required, but must be tech-comfortable)</span></li>\n<li><span style=\"font-size: 10pt\">Ambiguity, iteration, and feedback-driven workflows</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Willingness to:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Sign NDAs and work with sensitive or high-impact content</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Nice to Have (Not Required)</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Background in mental health, behavioral science, or psychology</span></li>\n<li><span style=\"font-size: 10pt\">Experience in QA, annotation, or qualitative analysis</span></li>\n<li><span style=\"font-size: 10pt\">Experience with AI systems in sensitive domains (e.g., healthcare, safety)</span></li>\n<li><span style=\"font-size: 10pt\">Familiarity with evaluation metrics or safety frameworks</span></li>\n</ul>\n<p><br></p>\n<p><span style=\"font-size: 10pt; font-weight: bold\">Compensation</span></p>\n<p><span style=\"font-size: 10pt\">$30-60/hour, depending on experience and specific project tasks/difficulty</span></p>",
    "compensation": "$30–$40/hour depending on project difficulty",
    "departmentId": "18634",
    "locationType": "1",
    "seekPromoted": false,
    "jobCategoryId": null,
    "jobOpeningName": "Red Teaming Expert",
    "departmentLabel": "Experts",
    "jobOpeningStatus": "Open",
    "minimumExperience": "Mid-level",
    "jobOpeningShareUrl": "https://mpathic2.bamboohr.com/careers/74",
    "employmentStatusLabel": "Contractor"
  }
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/2e73108d54307d25a65ea85208ee08ea7e7850b9?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/a49c47a0-5ae6-4084-89b9-187ae791ed8bJSON

GET https://api.bluedoor.sh/job-postings/v1/sources/b1af6ab1-26b4-4778-a1f4-d8ae41a6f240JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/2e73108d54307d25a65ea85208ee08ea7e7850b9/eventsJSON

Docs · Get an API key