Home › Companies › Mpathic2 › Red Teaming Expert
Red Teaming Expert
Mpathic2 · Active · $30 / hour · BambooHR
Job facts
| Field | Value |
|---|---|
| Company | Mpathic2 |
| Title | Red Teaming Expert |
| Normalized title | - |
| Department / team | Experts |
| Location | Seattle, WA, United States |
| Work model | - |
| Employment type | Contract |
| Salary | $30 / hour |
| Status | active |
| ATS provider | BambooHR |
| Posted / first seen | 2026-04-29 / 2026-05-30 |
| Changed / last seen | 2026-05-30 / 2026-06-06 |
Related slices
| Page | What it contains | Open |
|---|---|---|
| Company jobs | Active postings from Mpathic2. | Open |
| Company breakdowns | Role, location, ATS, and work model facets for this company. | Open |
| ATS provider jobs | Active postings observed through BambooHR. | Open |
| Provider filtered search | The same provider as a filtered job collection. | Open |
| City jobs | Active postings in Seattle. | Open |
| Department jobs | Active postings in Experts. | Open |
| Lifecycle events | Open, update, close, and reopen events for this posting. | Open |
| Original posting | Canonical source or apply URL captured from the ATS. | Open |
Linked records
| Company | Mpathic2 |
| Source | b1af6ab1-26b4-4778-a1f4-d8ae41a6f240 |
| ATS provider | BambooHR |
Description
About mpathic.ai
Keeping the human in AI. mpathic is a trusted leader in advancing quality and safety in AI systems through expert-led evaluation and human data. We partner with leading technology companies to support red teaming, trust & safety, expert annotation, and model evaluation across high-stakes domains.
About the Role
mpathic is seeking part-time, project-based Red Teaming Experts to support a red-teaming and evaluation campaign focused on AI safety and model behavior in sensitive, real-world interactions.
In this role, you will design, simulate, and evaluate conversations with AI systems to assess safety, risk, and behavioral performance. You will identify failure modes, edge cases, and policy gaps—particularly in scenarios involving distress, ambiguity, or escalation.
This role involves roleplaying and reviewing clinical scenarios with AI agents. As such, we are ideally seeking candidates who bring creative or performance-driven strengths , as these competencies enhance the realism, nuance, and emotional depth needed for AI safety testing. Examples of these can include, but are not limited to:
Theatre degrees or studies
Acting, theatre, improv, or voice-over experience
Strong writing skills, especially dialogue or scenario writing
Experience creating or inhabiting characters (e.g., performers, TTRPG roleplay, narrative designers)
Conversational design, interaction writing, or scripted roleplay experience
Participation in gaming, interactive storytelling, or digital communities where roleplay is common
What You’ll Be Working On
You will help identify, prevent, and characterize risks that emerge when users interact with AI systems.
Responsibilities may include:
Designing and executing red-teaming scenarios across diverse user behaviors
Reviewing AI-generated responses for safety, accuracy, and policy compliance
Identifying failure modes, edge cases, and behavioral risks
Assessing whether AI appropriately recognizes and responds to distress or escalation
Evaluating tone, boundaries, and appropriateness in sensitive interactions
Detecting misleading, overconfident, or unsafe responses
Evaluating multi-turn conversations for consistency and risk handling
Identifying gaps in responses, including missed signals or incomplete handling
Conducting qualitative analysis to identify behavioral patterns and system weaknesses
Documenting edge cases, failure patterns, and safety risks
Applying or contributing to evaluation rubrics, taxonomies, and frameworks
Supporting quality assurance (QA) to ensure consistency across evaluations
Collaborating with internal teams on AI safety and evaluation improvements
Participating in red teaming exercises to surface system vulnerabilities
Maintaining strict confidentiality and quality standards
What We’re Looking For
Successful candidates are detail-oriented, analytically strong, and experienced in evaluating or stress-testing AI systems in complex or high-risk scenarios.
Professional experience in one or more of the following:
LLM red teaming or AI safety evaluation
Trust & safety, content moderation, or policy enforcement
AI/ML evaluation, annotation, or QA workflows
Conversational analysis or behavioral risk assessment
Work involving sensitive or high-stakes user interactions
Strong understanding of:
AI safety principles and common failure modes
Behavioral risk, escalation patterns, and edge-case handling
Mental health sensitivity, boundaries, and responsible AI behavior
How users express distress, confusion, or harmful intent in conversation
Ability to identify:
Safety violations and policy gaps
Missed or mishandled risk signals
Unsafe, misleading, or overconfident responses
Inappropriate tone or boundary-setting
Failures in escalation, de-escalation, or resolution
Inconsistencies across multi-turn interactions
Experience with or Interest in:
Red teaming methodologies and adversarial testing
Evaluating conversational AI systems or chatbots
Developing or applying evaluation frameworks and rubrics
Understanding how AI systems perform under real user behavior
Comfort with:
Tech tools and platforms (Slack, spreadsheets, dashboards)
Evaluating AI-generated responses (no coding required, but must be tech-comfortable)
Ambiguity, iteration, and feedback-driven workflows
Willingness to:
Sign NDAs and work with sensitive or high-impact content
Nice to Have (Not Required)
Background in mental health, behavioral science, or psychology
Experience in QA, annotation, or qualitative analysis
Experience with AI systems in sensitive domains (e.g., healthcare, safety)
Familiarity with evaluation metrics or safety frameworks
Compensation
$30-60/hour, depending on experience and specific project tasks/difficulty
Full job record
| Job ID | 2e73108d54307d25a65ea85208ee08ea7e7850b9 |
| Org ID | a49c47a0-5ae6-4084-89b9-187ae791ed8b |
| Source ID | b1af6ab1-26b4-4778-a1f4-d8ae41a6f240 |
| Board ID | b1af6ab1-26b4-4778-a1f4-d8ae41a6f240 |
| Provider | bamboohr |
| Provider Job Key | 74 |
| Title | Red Teaming Expert |
| Normalized Title | — |
| Status | active |
| Active | yes |
| Location Text | — |
| Department | Experts |
| Team | — |
| Employment Type | contract |
| Workplace Type | — |
| Remote Policy | — |
| Country | United States |
| Region | WA |
| City | Seattle |
| Salary Raw | Compensation $30-60/hour, depending on experience and specific project tasks/difficulty |
| Salary Min | 30 |
| Salary Max | — |
| Salary Currency | USD |
| Salary Period | hour |
| Source URL | https://mpathic2.bamboohr.com/careers/74 |
| Apply URL | https://mpathic2.bamboohr.com/careers/74 |
| First Seen At | 2026-05-30 06:04:22Z |
| Last Seen At | 2026-06-06 09:39:29Z |
| Last Checked At | 2026-06-06 09:39:29Z |
| Last Changed At | 2026-05-30 06:04:22Z |
| Inactive At | — |
| Source Posted At | 2026-04-29 00:00:00Z |
| Source Updated At | — |
| Raw Payload Uri | s3://job-postings-prod-raw-590183727216/raw/provider=bamboohr/board=mpathic2/date=2026-06-06/2026-06-06T09-39-28-431Z-48285aeccaa3acf5233909cb80ca672f5c387545ccdedcf0178dd8e5287f8d09.json |
Event Fields
{
"content_hash": "0b84b4e91bd95783f352f29e3dbfd60713b0ac198cbd51af77990acc5497635b",
"source_hash": "4bd6e15f43b088dee20262dc65c8e85b206a1c1b8c1f09a48969cdd05978bbec",
"last_changed_at": "2026-05-30T06:04:22.595Z",
"active_status": "active"
}Parsed Structured
{
"language": "en",
"location": {
"raw": "Seattle, Washington, United States",
"city": "Seattle",
"region": "WA",
"country": "United States",
"is_remote": false,
"confidence": 0.8
},
"salary_max": null,
"salary_min": 30,
"inferred_at": "2026-06-06T09:39:29.915Z",
"launch_scope": {
"reason": "bamboohr_production_catalog",
"included": true,
"location": {
"raw": "Seattle, Washington, United States",
"city": "Seattle",
"region": "WA",
"country": "United States",
"is_remote": false,
"confidence": 0.8
},
"countries": [
"United States"
]
},
"remote_policy": null,
"salary_period": "hour",
"workplace_type": null,
"salary_currency": "USD"
}Extensions
{}Native Structured
{
"list_job": {
"id": "74",
"isRemote": null,
"location": {
"city": null,
"state": null
},
"atsLocation": {
"city": "Seattle",
"state": "Washington",
"country": "United States",
"province": null
},
"departmentId": "18634",
"locationType": "1",
"jobOpeningName": "Red Teaming Expert",
"departmentLabel": "Experts",
"employmentStatusLabel": "Contractor"
},
"detail_errors": [],
"detail_job_opening": {
"location": {
"city": null,
"state": null,
"postalCode": null,
"addressCountry": null
},
"datePosted": "2026-04-29",
"atsLocation": {
"city": "Seattle",
"state": "Washington",
"country": "United States",
"countryId": "1"
},
"description": "<ul></ul>\n<p><span style=\"font-family: Inter, sans-serif; font-size: 12pt; font-weight: bold\">About mpathic.ai</span></p>\n<p><span style=\"font-size: 10pt\">Keeping the human in AI. mpathic is a trusted leader in advancing quality and safety in AI systems through expert-led evaluation and human data. We partner with leading technology companies to support red teaming, trust & safety, expert annotation, and model evaluation across high-stakes domains.</span></p>\n<p><br></p>\n<p><span style=\"font-family: Inter, sans-serif; font-size: 12pt; font-weight: bold\">About the Role</span></p>\n<p><span style=\"font-size: 10pt\">mpathic is seeking <span style=\"font-weight: bold\">part-time, project-based Red Teaming Experts</span> to support a red-teaming and evaluation campaign focused on AI safety and model behavior in sensitive, real-world interactions.</span></p>\n<p><br><br></p>\n<p><span style=\"font-size: 10pt\">In this role, you will design, simulate, and evaluate conversations with AI systems to assess safety, risk, and behavioral performance. You will identify failure modes, edge cases, and policy gaps—particularly in scenarios involving distress, ambiguity, or escalation.</span><br></p>\n<p><br><br></p>\n<p><span style=\"font-family: Arial, sans-serif; font-size: 10pt\"><span style=\"font-weight: bold\">This role involves roleplaying and reviewing clinical scenarios with AI agents. </span>As such, we are ideally seeking candidates who bring <span style=\"font-weight: bold\">creative or performance-driven strengths</span>, as these competencies enhance the realism, nuance, and emotional depth needed for AI safety testing. Examples of these can include, but are not limited to: </span></p>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Theatre degrees or studies</span></li>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Acting, theatre, improv, or voice-over experience </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Strong writing skills, especially dialogue or scenario writing </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Experience creating or inhabiting characters (e.g., performers, TTRPG roleplay, narrative designers) </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Conversational design, interaction writing, or scripted roleplay experience </span></li>\n</ul>\n<ul>\n<li><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">Participation in gaming, interactive storytelling, or digital communities where roleplay is common </span></li>\n</ul>\n<p><br></p>\n<p><span style=\"font-family: Inter, sans-serif; font-size: 10pt\"><span style=\"font-family: Inter, sans-serif; font-size: 12pt; font-weight: bold\">What You’ll Be Working On </span></span></p>\n<p><span style=\"font-size: 10pt\">You will help identify, prevent, and characterize risks that emerge when users interact with AI systems.</span></p>\n<p><br><br></p>\n<p><span style=\"font-size: 10pt\">Responsibilities may include:</span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Designing and executing red-teaming scenarios across diverse user behaviors</span></li>\n<li><span style=\"font-size: 10pt\">Reviewing AI-generated responses for safety, accuracy, and policy compliance</span></li>\n<li><span style=\"font-size: 10pt\">Identifying failure modes, edge cases, and behavioral risks</span></li>\n<li><span style=\"font-size: 10pt\">Assessing whether AI appropriately recognizes and responds to distress or escalation</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating tone, boundaries, and appropriateness in sensitive interactions</span></li>\n<li><span style=\"font-size: 10pt\">Detecting misleading, overconfident, or unsafe responses</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating multi-turn conversations for consistency and risk handling</span></li>\n<li><span style=\"font-size: 10pt\">Identifying gaps in responses, including missed signals or incomplete handling</span></li>\n<li><span style=\"font-size: 10pt\">Conducting qualitative analysis to identify behavioral patterns and system weaknesses</span></li>\n<li><span style=\"font-size: 10pt\">Documenting edge cases, failure patterns, and safety risks</span></li>\n<li><span style=\"font-size: 10pt\">Applying or contributing to evaluation rubrics, taxonomies, and frameworks</span></li>\n<li><span style=\"font-size: 10pt\">Supporting quality assurance (QA) to ensure consistency across evaluations</span></li>\n<li><span style=\"font-size: 10pt\">Collaborating with internal teams on AI safety and evaluation improvements</span></li>\n<li><span style=\"font-size: 10pt\">Participating in red teaming exercises to surface system vulnerabilities</span></li>\n<li><span style=\"font-size: 10pt\">Maintaining strict confidentiality and quality standards</span></li>\n</ul>\n<p><br></p>\n<p><span style=\"font-weight: bold\">What We’re Looking For</span></p>\n<p><span style=\"font-size: 10pt\">Successful candidates are detail-oriented, analytically strong, and experienced in evaluating or stress-testing AI systems in complex or high-risk scenarios.</span></p>\n<p><br><br></p>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Professional experience in one or more of the following:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">LLM red teaming or AI safety evaluation</span></li>\n<li><span style=\"font-size: 10pt\">Trust & safety, content moderation, or policy enforcement</span></li>\n<li><span style=\"font-size: 10pt\">AI/ML evaluation, annotation, or QA workflows</span></li>\n<li><span style=\"font-size: 10pt\">Conversational analysis or behavioral risk assessment</span></li>\n<li><span style=\"font-size: 10pt\">Work involving sensitive or high-stakes user interactions</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Strong understanding of:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">AI safety principles and common failure modes</span></li>\n<li><span style=\"font-size: 10pt\">Behavioral risk, escalation patterns, and edge-case handling</span></li>\n<li><span style=\"font-size: 10pt\">Mental health sensitivity, boundaries, and responsible AI behavior</span></li>\n<li><span style=\"font-size: 10pt\">How users express distress, confusion, or harmful intent in conversation</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Ability to identify:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Safety violations and policy gaps</span></li>\n<li><span style=\"font-size: 10pt\">Missed or mishandled risk signals</span></li>\n<li><span style=\"font-size: 10pt\">Unsafe, misleading, or overconfident responses</span></li>\n<li><span style=\"font-size: 10pt\">Inappropriate tone or boundary-setting</span></li>\n<li><span style=\"font-size: 10pt\">Failures in escalation, de-escalation, or resolution</span></li>\n<li><span style=\"font-size: 10pt\">Inconsistencies across multi-turn interactions</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Experience with or Interest in:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Red teaming methodologies and adversarial testing</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating conversational AI systems or chatbots</span></li>\n<li><span style=\"font-size: 10pt\">Developing or applying evaluation frameworks and rubrics</span></li>\n<li><span style=\"font-size: 10pt\">Understanding how AI systems perform under real user behavior</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Comfort with:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Tech tools and platforms (Slack, spreadsheets, dashboards)</span></li>\n<li><span style=\"font-size: 10pt\">Evaluating AI-generated responses (no coding required, but must be tech-comfortable)</span></li>\n<li><span style=\"font-size: 10pt\">Ambiguity, iteration, and feedback-driven workflows</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Willingness to:</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Sign NDAs and work with sensitive or high-impact content</span></li>\n</ul>\n<p><span style=\"font-size: 10pt\"><span style=\"font-weight: bold\">Nice to Have (Not Required)</span></span></p>\n<ul>\n<li><span style=\"font-size: 10pt\">Background in mental health, behavioral science, or psychology</span></li>\n<li><span style=\"font-size: 10pt\">Experience in QA, annotation, or qualitative analysis</span></li>\n<li><span style=\"font-size: 10pt\">Experience with AI systems in sensitive domains (e.g., healthcare, safety)</span></li>\n<li><span style=\"font-size: 10pt\">Familiarity with evaluation metrics or safety frameworks</span></li>\n</ul>\n<p><br></p>\n<p><span style=\"font-size: 10pt; font-weight: bold\">Compensation</span></p>\n<p><span style=\"font-size: 10pt\">$30-60/hour, depending on experience and specific project tasks/difficulty</span></p>",
"compensation": "$30–$40/hour depending on project difficulty",
"departmentId": "18634",
"locationType": "1",
"seekPromoted": false,
"jobCategoryId": null,
"jobOpeningName": "Red Teaming Expert",
"departmentLabel": "Experts",
"jobOpeningStatus": "Open",
"minimumExperience": "Mid-level",
"jobOpeningShareUrl": "https://mpathic2.bamboohr.com/careers/74",
"employmentStatusLabel": "Contractor"
}
}Get this page with API
Rendered from the bluedoor Job Postings API. Reproduce it:
GET https://api.bluedoor.sh/job-postings/v1/jobs/2e73108d54307d25a65ea85208ee08ea7e7850b9?include=descriptionJSONGET https://api.bluedoor.sh/job-postings/v1/orgs/a49c47a0-5ae6-4084-89b9-187ae791ed8bJSONGET https://api.bluedoor.sh/job-postings/v1/sources/b1af6ab1-26b4-4778-a1f4-d8ae41a6f240JSONGET https://api.bluedoor.sh/job-postings/v1/jobs/2e73108d54307d25a65ea85208ee08ea7e7850b9/eventsJSON