Home › Companies › Fullbay › Observability & Operations Engineer

Observability & Operations Engineer

Job facts

Field	Value
Company	Fullbay
Title	Observability & Operations Engineer
Normalized title	-
Department / team	Software Engineering
Location	Phoenix, AZ, United States
Work model	-
Employment type	Full Time
Salary	-
Status	active
ATS provider	BambooHR
Posted / first seen	2026-03-09 / 2026-05-30
Changed / last seen	2026-05-30 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Fullbay.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through BambooHR.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Phoenix.	Open
Department jobs	Active postings in Software Engineering.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Fullbay
Source	68fd8217-bb26-4567-862f-6d14d487322a
ATS provider	BambooHR

Description

Observability & Operations Engineer About Us: At Fullbay, our mission is simple — to create safer roads for our families and yours. As leaders in the heavy-duty repair industry, we power shops with technology that helps them run smarter and more efficiently. As an AI-First company, we invite artificial intelligence to eliminate friction, spark innovation, and drive efficiencies in every conversation— for our teams and our customers. Position Overview: The Observability & Operations Engineer is a key technical contributor who brings an AI-first mindset to maintaining, monitoring, and operating our AWS cloud environment and internal Developer Platform. In this role, you won’t just react to incidents — you’ll leverage AI-powered tooling, intelligent alerting, and automation to get ahead of problems before they impact users. You’ll work deeply across AWS and its PaaS ecosystem, building repeatable, code-first pipelines that treat infrastructure and observability configuration as first-class software. From using AI coding assistants to accelerate runbook development, to applying ML-based anomaly detection across logs and metrics, you’ll be expected to ask “how can AI help here?” as a first instinct. Working within a dedicated platform team, you’ll build the observability foundations that keep our systems fast, reliable, and self-healing. Primary Duties & Responsibilities: Design and implement a comprehensive observability strategy (logging, metrics, tracing, alerting) across all AWS environments, leveraging AI-powered tools to detect anomalies and surface insights automatically Build and manage monitoring platforms such as Datadog, Grafana, Prometheus, and AWS CloudWatch — actively exploring AI-native features within these tools to reduce alert fatigue and improve signal quality Use AI coding assistants (e.g. GitHub Copilot, Claude) to accelerate development of dashboards, runbooks, and automation scripts Own the incident management lifecycle — on-call rotations, post-mortems, root cause analysis — and apply AI-assisted log analysis to speed up diagnosis and resolution Instrument Java, Kotlin, and Node.js-based cloud-native applications to emit structured logs, distributed traces, and metrics; identify opportunities to use ML-based anomaly detection in place of static thresholds Build repeatable, code-first observability pipelines that treat dashboards, alerts, and runbooks as first-class software — versioned, tested, and deployed through Harness Leverage AWS PaaS services (Lambda, API Gateway, ECS, RDS, SQS, SNS, and others) to build scalable, automated operational tooling Collaborate with development teams to embed observability and AI-assisted quality checks into CI/CD pipelines via Harness Own the FinOps function for our AWS environment — tracking cloud spend, building cost dashboards, identifying waste, and using AI-powered cost analysis tools to surface optimization opportunities and drive accountability across engineering teams Monitor AWS infrastructure for performance, availability, and cost — partnering with finance and engineering to enforce spend governance Develop and maintain Infrastructure as Code using Terraform, using AI pair programming to improve quality and consistency Contribute to architectural decisions with a focus on resilience, automation, and reducing toil through intelligent systems Adheres to all confidentiality and compliance regulations Performs other duties as assigned Minimum Education & Work Experience: 7 –10 years of experience in Software Engineering, Cloud Operations, or Site Reliability Engineering 5+ years of hands-on experience with AWS infrastructure and AWS PaaS services; certifications are a plus Demonstrated experience building repeatable, code-first pipelines and treating operational configuration as first-class software Experience working with polyglot environments including Java, Kotlin, and Node.js Demonstrated experience using AI tools (coding assistants, AI-powered observability platforms, or similar) in a professional setting — we’re an AI-first company and expect this to be part of how you work, not something you’re just exploring Key Skills and Qualifications: Deep experience with enterprise observability platforms — including AWS-native tooling such as CloudWatch, X-Ray, and OpenTelemetry, or comparable platforms such as Datadog, Grafana, or Prometheus Proficiency with distributed tracing frameworks and log management platforms (e.g. ELK Stack, Splunk, Fluent Bit); experience mapping these patterns to AWS-native tooling is a strong plus Strong understanding of SRE principles including SLOs, SLAs, error budgets, and chaos engineering Hands-on FinOps experience — cloud cost allocation, chargeback modeling, rightsizing, and savings plans optimization across AWS Strong working knowledge of AWS PaaS services including Lambda, API Gateway, ECS, RDS, SQS, SNS, and IAM — and how to leverage them to build scalable operational tooling Experience instrumenting polyglot applications (Java, Kotlin, Node.js) and cloud-native microservices for observability Proven ability to build repeatable, code-first pipelines — treating dashboards, alerts, runbooks, and infrastructure configuration as versioned, testable software Experience with CI/CD tooling, specifically Harness Solid understanding of Infrastructure as Code using Terraform Fluency with AI tools in day-to-day work — whether that’s AI coding assistants, AI-powered monitoring features, or using LLMs to accelerate problem solving; you default to asking “can AI help here?” before doing things the hard way Ability to lead incident response, facilitate blameless post-mortems, and drive long-term reliability improvements Strong collaboration skills for working across platform and product engineering teams Knowledge of containerization technologies and microservices architecture Physical Demands and Work Environment: The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions Regularly required to sit at a desk in front of a computer and use hands to finger, handle, or feel objects, tools, or controls (including a computer keyboard and operating a telephone), lift and/or move up to 10 pounds. Frequently requires the use of hands and arms for reaching, as well as the ability to walk and communicate effectively through speaking and listening. Specific vision abilities required by this position include close vision, color vision, and the ability to adjust focus. Noise level in the work environment is usually moderate. Type on a computer keyboard and look at a computer monitor, and operate a cell phone or a computer-based phone

Full job record

Job ID	81593c92f0b51eff0c45b88e1f59dac1d5d2c756
Org ID	2c21cf31-50a0-4102-911e-73bfb33a8c58
Source ID	68fd8217-bb26-4567-862f-6d14d487322a
Board ID	68fd8217-bb26-4567-862f-6d14d487322a
Provider	bamboohr
Provider Job Key	223
Title	Observability & Operations Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	—
Department	Software Engineering
Team	—
Employment Type	full_time
Workplace Type	—
Remote Policy	—
Country	United States
Region	AZ
City	Phoenix
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://fullbay.bamboohr.com/careers/223
Apply URL	https://fullbay.bamboohr.com/careers/223
First Seen At	2026-05-30 05:51:23Z
Last Seen At	2026-06-06 10:31:31Z
Last Checked At	2026-06-06 10:31:31Z
Last Changed At	2026-05-30 05:51:23Z
Inactive At	—
Source Posted At	2026-03-09 00:00:00Z
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=bamboohr/board=fullbay/date=2026-06-06/2026-06-06T10-31-30-166Z-939801b80f20b6877456553fbfada5076d1abbf576268fc807cadf5dd93bc261.json

Event Fields

{
  "content_hash": "8aeb2b2034b6516ae90d9b0f0cbae231c379719aee68a3a9a90528e9995e9f02",
  "source_hash": "8393112ec2aee5a372586a22f5d92f9c2244afee97f7930c298bfc7bc2524a59",
  "last_changed_at": "2026-05-30T05:51:23.894Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "Phoenix, Arizona, United States",
    "city": "Phoenix",
    "region": "AZ",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.8
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T10:31:31.187Z",
  "launch_scope": {
    "reason": "bamboohr_production_catalog",
    "included": true,
    "location": {
      "raw": "Phoenix, Arizona, United States",
      "city": "Phoenix",
      "region": "AZ",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.8
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "list_job": {
    "id": "223",
    "isRemote": null,
    "location": {
      "city": null,
      "state": null
    },
    "atsLocation": {
      "city": "Phoenix",
      "state": "Arizona",
      "country": "United States",
      "province": null
    },
    "departmentId": "18784",
    "locationType": "1",
    "jobOpeningName": "Observability & Operations Engineer",
    "departmentLabel": "Software Engineering",
    "employmentStatusLabel": "Full-Time"
  },
  "detail_errors": [],
  "detail_job_opening": {
    "location": {
      "city": null,
      "state": null,
      "postalCode": null,
      "addressCountry": null
    },
    "datePosted": "2026-03-09",
    "atsLocation": {
      "city": "Phoenix",
      "state": "Arizona",
      "country": "United States",
      "countryId": "1"
    },
    "description": "<p><span style=\"color: rgb(26, 23, 23); font-size: 24pt; font-weight: bold\">Observability &amp; Operations Engineer  </span></p>\n<p><span style=\"color: rgb(26, 23, 23); font-size: 12pt\"><span style=\"font-size: 10pt; font-weight: bold\">About Us:</span></span></p>\n<p><span style=\"color: rgb(26, 23, 23); font-size: 10pt\">At Fullbay, our mission is simple — to create safer roads for our families and yours. As leaders in the heavy-duty repair industry, we power shops with technology that helps them run smarter and more efficiently. As an AI-First company, we invite artificial intelligence to eliminate friction, spark innovation, and drive efficiencies in every conversation— for our teams and our customers.<br><br></span></p>\n<p><span style=\"font-size: 10pt\"><span style=\"color: rgb(26, 23, 23); font-weight: bold\">Position Overview:</span></span></p>\n<p><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">The Observability &amp; Operations Engineer is a key technical contributor who brings an AI-first mindset to maintaining, monitoring, and operating our AWS cloud environment and internal Developer Platform. In this role, you won’t just react to incidents — you’ll leverage AI-powered tooling, intelligent alerting, and automation to get ahead of problems before they impact users. You’ll work deeply across AWS and its PaaS ecosystem, building repeatable, code-first pipelines that treat infrastructure and observability configuration as first-class software. From using AI coding assistants to accelerate runbook development, to applying ML-based anomaly detection across logs and metrics, you’ll be expected to ask “how can AI help here?” as a first instinct. Working within a dedicated platform team, you’ll build the observability foundations that keep our systems fast, reliable, and self-healing.<br><br></span></p>\n<p><span style=\"color: rgb(26, 23, 23); font-size: 18pt\">Primary Duties &amp; Responsibilities:</span></p>\n<ul>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Design and implement a comprehensive observability strategy (logging, metrics, tracing, alerting) across all AWS environments, leveraging AI-powered tools to detect anomalies and surface insights automatically</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Build and manage monitoring platforms such as Datadog, Grafana, Prometheus, and AWS CloudWatch — actively exploring AI-native features within these tools to reduce alert fatigue and improve signal quality</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Use AI coding assistants (e.g. GitHub Copilot, Claude) to accelerate development of dashboards, runbooks, and automation scripts</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Own the incident management lifecycle — on-call rotations, post-mortems, root cause analysis — and apply AI-assisted log analysis to speed up diagnosis and resolution</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Instrument Java, Kotlin, and Node.js-based cloud-native applications to emit structured logs, distributed traces, and metrics; identify opportunities to use ML-based anomaly detection in place of static thresholds</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Build repeatable, code-first observability pipelines that treat dashboards, alerts, and runbooks as first-class software — versioned, tested, and deployed through Harness</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Leverage AWS PaaS services (Lambda, API Gateway, ECS, RDS, SQS, SNS, and others) to build scalable, automated operational tooling</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Collaborate with development teams to embed observability and AI-assisted quality checks into CI/CD pipelines via Harness</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Own the FinOps function for our AWS environment — tracking cloud spend, building cost dashboards, identifying waste, and using AI-powered cost analysis tools to surface optimization opportunities and drive accountability across engineering teams</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Monitor AWS infrastructure for performance, availability, and cost — partnering with finance and engineering to enforce spend governance</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Develop and maintain Infrastructure as Code using Terraform, using AI pair programming to improve quality and consistency</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Contribute to architectural decisions with a focus on resilience, automation, and reducing toil through intelligent systems</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Adheres to all confidentiality and compliance regulations</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Performs other duties as assigned<br><br></span></li>\n</ul>\n<p><span style=\"color: rgb(0, 0, 0); font-size: 12pt\">Minimum Education &amp; Work Experience:</span></p>\n<ul>\n<li><span style=\"color: rgb(0, 0, 0)\"><span style=\"font-family: Arial, sans-serif; font-size: 10pt\">7</span><span style=\"font-size: 10pt\">–10 years of experience in Software Engineering, Cloud Operations, or Site Reliability Engineering</span></span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">5+ years of hands-on experience with AWS infrastructure and AWS PaaS services; certifications are a plus</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Demonstrated experience building repeatable, code-first pipelines and treating operational configuration as first-class software</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Experience working with polyglot environments including Java, Kotlin, and Node.js</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Demonstrated experience using AI tools (coding assistants, AI-powered observability platforms, or similar) in a professional setting — we’re an AI-first company and expect this to be part of how you work, not something you’re just exploring<br><br></span></li>\n</ul>\n<p><span style=\"color: rgb(0, 0, 0); font-size: 12pt\">Key Skills and Qualifications:</span></p>\n<ul>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Deep experience with enterprise observability platforms — including AWS-native tooling such as CloudWatch, X-Ray, and OpenTelemetry, or comparable platforms such as Datadog, Grafana, or Prometheus</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Proficiency with distributed tracing frameworks and log management platforms (e.g. ELK Stack, Splunk, Fluent Bit); experience mapping these patterns to AWS-native tooling is a strong plus</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Strong understanding of SRE principles including SLOs, SLAs, error budgets, and chaos engineering</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Hands-on FinOps experience — cloud cost allocation, chargeback modeling, rightsizing, and savings plans optimization across AWS</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Strong working knowledge of AWS PaaS services including Lambda, API Gateway, ECS, RDS, SQS, SNS, and IAM — and how to leverage them to build scalable operational tooling</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Experience instrumenting polyglot applications (Java, Kotlin, Node.js) and cloud-native microservices for observability</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Proven ability to build repeatable, code-first pipelines — treating dashboards, alerts, runbooks, and infrastructure configuration as versioned, testable software</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Experience with CI/CD tooling, specifically Harness</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Solid understanding of Infrastructure as Code using Terraform</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Fluency with AI tools in day-to-day work — whether that’s AI coding assistants, AI-powered monitoring features, or using LLMs to accelerate problem solving; you default to asking “can AI help here?” before doing things the hard way</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Ability to lead incident response, facilitate blameless post-mortems, and drive long-term reliability improvements</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Strong collaboration skills for working across platform and product engineering teams</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Knowledge of containerization technologies and microservices architecture<br><br></span></li>\n</ul>\n<p><span style=\"color: rgb(0, 0, 0); font-size: 12pt\">Physical Demands and Work Environment:</span></p>\n<p><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">The physical demands described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions</span></p>\n<ul>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Regularly required to sit at a desk in front of a computer and use hands to finger, handle, or feel objects, tools, or controls (including a computer keyboard and operating a telephone), lift and/or move up to 10 pounds. </span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Frequently requires the use of hands and arms for reaching, as well as the ability to walk and communicate effectively through speaking and listening.</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Specific vision abilities required by this position include close vision, color vision, and the ability to adjust focus.   </span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Noise level in the work environment is usually moderate.</span></li>\n<li><span style=\"color: rgb(0, 0, 0); font-size: 10pt\">Type on a computer keyboard and look at a computer monitor, and operate a cell phone or a computer-based phone</span></li>\n</ul>\n<p><span style=\"font-family: Arial, sans-serif; font-size: 10pt\"><br></span></p>",
    "compensation": "$131,709.29 - $161,343.88",
    "departmentId": "18784",
    "locationType": "1",
    "seekPromoted": false,
    "jobCategoryId": null,
    "jobOpeningName": "Observability & Operations Engineer",
    "departmentLabel": "Software Engineering",
    "jobOpeningStatus": "Open",
    "minimumExperience": "Experienced",
    "jobOpeningShareUrl": "https://fullbay.bamboohr.com/careers/223",
    "employmentStatusLabel": "Full-Time"
  }
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/81593c92f0b51eff0c45b88e1f59dac1d5d2c756?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/2c21cf31-50a0-4102-911e-73bfb33a8c58JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/68fd8217-bb26-4567-862f-6d14d487322aJSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/81593c92f0b51eff0c45b88e1f59dac1d5d2c756/eventsJSON

Docs · Get an API key