Home › Companies › Zealogicsllc › Site Reliability Engineer

Site Reliability Engineer

Zealogicsllc · Alpharetta, GA · Deleted · JazzHR / ApplyToJob

Job facts

Field	Value
Company	Zealogicsllc
Title	Site Reliability Engineer
Normalized title	-
Department / team	-
Location	Alpharetta, GA, United States
Work model	-
Employment type	Contract
Salary	-
Status	deleted
ATS provider	JazzHR / ApplyToJob
Posted / first seen	2026-05-28 / 2026-05-30
Changed / last seen	2026-06-03 / 2026-06-01

Related slices

Page	What it contains	Open
Company jobs	Active postings from Zealogicsllc.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through JazzHR / ApplyToJob.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Alpharetta.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Zealogicsllc
Source	a0143f5c-eca1-4564-b522-fa6107650f3c
ATS provider	JazzHR / ApplyToJob

Description

Role Overview The Site Reliability Engineer will support Cyber Data Risk & Resilience by ensuring the reliability, availability, performance, and operational visibility of critical cybersecurity platforms and services. This role is responsible for keeping production systems running, instrumenting infrastructure and application layers, building meaningful monitoring and actionable alerting, supporting incident response, and continuously improving dashboards used by engineering, operations, risk, and executive stakeholders. Responsibilities Maintain and improve the reliability, availability, scalability, and performance of cybersecurity platforms, services, and supporting infrastructure Support day-to-day operational stability by monitoring system health, identifying risks, responding to incidents, and driving timely resolution of service-impacting issues Instrument infrastructure, applications, services, APIs, data pipelines, and cloud components to provide end-to-end visibility into system behavior and service health Design, build, and continuously refine monitoring, alerting, logging, tracing, and observability capabilities across distributed systems and cloud environments Develop meaningful and actionable alerts that reduce noise, improve signal quality, and enable teams to respond quickly to emerging issues Define and track key reliability metrics, including availability, latency, throughput, error rates, saturation, service-level indicators, service-level objectives, and operational risk indicators Build, maintain, and enhance dashboards for engineering, operations, product, risk, and executive stakeholders, ensuring information is accurate, timely, and decision-ready Continuously modify and improve executive dashboards to support regular leadership reviews of service health, reliability trends, incidents, risks, and operational performance Partner with engineering, cybersecurity, infrastructure, cloud, and application teams to identify reliability gaps and implement long-term improvements Participate in incident response, root-cause analysis, problem management, and post-incident reviews to prevent recurrence and improve operational maturity Automate operational tasks, health checks, reporting, deployment validation, and recovery procedures to improve efficiency and reduce manual effort Collaborate with application and platform teams to embed reliability, monitoring, and supportability requirements into the software development lifecycle Support CI/CD, DevOps, and release management practices by validating operational readiness, monitoring coverage, rollback plans, and production support requirements Contribute to resiliency engineering efforts, including capacity planning, performance tuning, failover validation, disaster recovery readiness, and chaos/resilience testing where applicable Ensure monitoring, alerting, dashboards, and operational processes align with enterprise security, risk, compliance, and governance standards Required Qualifications 7 to 10+ years of experience in site reliability engineering, systems engineering, software engineering, DevOps, infrastructure engineering, or production operations Strong experience supporting highly available, distributed, cloud-based, or mission-critical technology platforms Hands-on experience with observability practices, including monitoring, alerting, logging, metrics, tracing, dashboards, and service health reporting Experience instrumenting applications, services, APIs, infrastructure, databases, and cloud components to enable end-to-end operational visibility Strong understanding of reliability engineering concepts, including SLIs, SLOs, SLAs, error budgets, incident management, capacity management, and operational readiness Experience designing actionable alerts that support rapid issue detection, triage, escalation, and resolution Experience building and maintaining operational dashboards for technical teams, support teams, and senior/executive stakeholders Strong scripting or programming skills using Python, Java, Bash, PowerShell, or similar languages for automation and operational tooling Experience with cloud platforms such as AWS, Azure, or GCP Experience with Infrastructure-as-Code tools such as Terraform or similar technologies Experience working with CI/CD pipelines, DevOps workflows, release processes, and production support models Experience troubleshooting distributed systems, REST services, event-driven architectures, messaging platforms, and service-to-service integrations Familiarity with relational and non-relational databases, such as PostgreSQL, MSSQL, MongoDB, or similar platforms Strong analytical, troubleshooting, and problem-solving skills with the ability to diagnose complex technical issues across multiple layers of the stack Strong written and verbal communication skills, including the ability to translate technical issues into clear business and executive-level updates Preferred Skills Experience supporting cybersecurity, risk, resilience, compliance, or enterprise security platforms Experience with observability and monitoring tools such as Splunk, Grafana, Prometheus, Datadog, Dynatrace, New Relic, Azure Monitor, CloudWatch, OpenTelemetry, or similar platforms Experience creating executive-level service health dashboards, reliability scorecards, operational risk reporting, or incident trend reporting Experience developing automated health checks, synthetic monitoring, service dependency maps, and operational runbooks Experience with incident response, major incident management, postmortems, root-cause analysis, and problem management practices Experience with containerized and cloud-native environments, including Kubernetes, Docker, serverless services, or managed cloud platforms Experience with distributed messaging or streaming platforms such as Apache Kafka Familiarity with cloud-native security, governance, and policy tooling such as Azure Policy, AWS SCP, GCP constraints, or related controls Familiarity with Cloud Security Posture Management tools such as Wiz, Prisma, CloudGuard, or similar platforms Experience with cloud-based AI services such as Azure AI, AWS Bedrock, or Google Vertex AI, particularly from an operational monitoring, reliability, or governance perspective Experience supporting Linux and Windows environments through scripting, automation, monitoring, and operational troubleshooting Exposure to web technologies, APIs, front-end services, or user-facing application monitoring Additional Skills Strong ownership mindset with a focus on operational excellence and service reliability Ability to operate effectively in fast-paced, production-focused environments with minimal supervision Strong ability to prioritize issues based on customer impact, business risk, service criticality, and operational urgency Effective collaboration skills across engineering, operations, cybersecurity, infrastructure, risk, and executive stakeholder groups Ability to communicate service health, operational risks, incidents, and reliability trends clearly to both technical and non-technical audiences Proactive and continuous-improvement mindset with a focus on automation, simplification, resilience, and measurable outcomes Strong attention to detail when building dashboards, defining metrics, tuning alerts, and preparing executive-level operational reporting Rate range -$60-$65

Full job record

Job ID	f892312582bbf4f7ec7394ed2c49ba89a8a31e8c
Org ID	9e15eb95-ecd1-48cc-a563-657594cc1675
Source ID	a0143f5c-eca1-4564-b522-fa6107650f3c
Board ID	a0143f5c-eca1-4564-b522-fa6107650f3c
Provider	jazzhr
Provider Job Key	kI1PbQO6fW
Title	Site Reliability Engineer
Normalized Title	—
Status	deleted
Active	no
Location Text	Alpharetta, GA
Department	—
Team	—
Employment Type	contract
Workplace Type	—
Remote Policy	—
Country	United States
Region	GA
City	Alpharetta
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://zealogicsllc.applytojob.com/apply/kI1PbQO6fW/Site-Reliability-Engineer
Apply URL	https://zealogicsllc.applytojob.com/apply/kI1PbQO6fW/Site-Reliability-Engineer
First Seen At	2026-05-30 06:02:14Z
Last Seen At	2026-06-01 14:26:33Z
Last Checked At	2026-06-03 13:00:07Z
Last Changed At	2026-06-03 13:00:07Z
Inactive At	2026-06-03 13:00:07Z
Source Posted At	2026-05-28 00:00:00Z
Source Updated At	—
Raw Payload Uri	s3://bluework-jobs-prod-raw-590183727216/raw/provider=jazzhr/board=zealogicsllc/date=2026-06-01/2026-06-01T14-26-32-813Z-16f984fdb861782f039da88e3fef00ec3e257de3f3d5377f032314f3d865cdf4.json

Event Fields

{
  "content_hash": "c4cc553727da0d173e6d3436ce15fa2a0d060091276fb328edab6f9a8ab8ca3d",
  "source_hash": "95e873368951874a2a3e2270edd4d1eff286cad4ea0c39f0c2e8b8e6ab37494e",
  "last_changed_at": "2026-06-03T13:00:07.587Z",
  "active_status": "deleted"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "Alpharetta, GA",
    "city": "Alpharetta",
    "region": "GA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.9
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-01T14:26:33.531Z",
  "launch_scope": {
    "reason": "jazzhr_production_catalog",
    "included": true,
    "location": {
      "raw": "Alpharetta, GA",
      "city": "Alpharetta",
      "region": "GA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.9
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "detail": {
    "url": "https://zealogicsllc.applytojob.com/apply/jobs/details/kI1PbQO6fW?&",
    "heading": "Site Reliability Engineer",
    "html_title": "JazzHR &raquo; Job Listings",
    "canonical_url": "https://zealogicsllc.applytojob.com/apply/kI1PbQO6fW/Site-Reliability-Engineer",
    "description_html": "<p><strong>Role Overview</strong></p><p>The Site Reliability Engineer will support Cyber Data Risk & Resilience by ensuring the reliability, availability, performance, and operational visibility of critical cybersecurity platforms and services. This role is responsible for keeping production systems running, instrumenting infrastructure and application layers, building meaningful monitoring and actionable alerting, supporting incident response, and continuously improving dashboards used by engineering, operations, risk, and executive stakeholders.</p> <p><strong>Responsibilities</strong></p><ul type=\"disc\"><li>Maintain and improve the reliability, availability, scalability, and performance of cybersecurity platforms, services, and supporting infrastructure</li><li>Support day-to-day operational stability by monitoring system health, identifying risks, responding to incidents, and driving timely resolution of service-impacting issues</li><li>Instrument infrastructure, applications, services, APIs, data pipelines, and cloud components to provide end-to-end visibility into system behavior and service health</li><li>Design, build, and continuously refine monitoring, alerting, logging, tracing, and observability capabilities across distributed systems and cloud environments</li><li>Develop meaningful and actionable alerts that reduce noise, improve signal quality, and enable teams to respond quickly to emerging issues</li><li>Define and track key reliability metrics, including availability, latency, throughput, error rates, saturation, service-level indicators, service-level objectives, and operational risk indicators</li><li>Build, maintain, and enhance dashboards for engineering, operations, product, risk, and executive stakeholders, ensuring information is accurate, timely, and decision-ready</li><li>Continuously modify and improve executive dashboards to support regular leadership reviews of service health, reliability trends, incidents, risks, and operational performance</li><li>Partner with engineering, cybersecurity, infrastructure, cloud, and application teams to identify reliability gaps and implement long-term improvements</li><li>Participate in incident response, root-cause analysis, problem management, and post-incident reviews to prevent recurrence and improve operational maturity</li><li>Automate operational tasks, health checks, reporting, deployment validation, and recovery procedures to improve efficiency and reduce manual effort</li><li>Collaborate with application and platform teams to embed reliability, monitoring, and supportability requirements into the software development lifecycle</li><li>Support CI/CD, DevOps, and release management practices by validating operational readiness, monitoring coverage, rollback plans, and production support requirements</li><li>Contribute to resiliency engineering efforts, including capacity planning, performance tuning, failover validation, disaster recovery readiness, and chaos/resilience testing where applicable</li><li>Ensure monitoring, alerting, dashboards, and operational processes align with enterprise security, risk, compliance, and governance standards</li></ul> <p><strong>Required Qualifications</strong></p><ul type=\"disc\"><li>7 to 10+ years of experience in site reliability engineering, systems engineering, software engineering, DevOps, infrastructure engineering, or production operations</li><li>Strong experience supporting highly available, distributed, cloud-based, or mission-critical technology platforms</li><li>Hands-on experience with observability practices, including monitoring, alerting, logging, metrics, tracing, dashboards, and service health reporting</li><li>Experience instrumenting applications, services, APIs, infrastructure, databases, and cloud components to enable end-to-end operational visibility</li><li>Strong understanding of reliability engineering concepts, including SLIs, SLOs, SLAs, error budgets, incident management, capacity management, and operational readiness</li><li>Experience designing actionable alerts that support rapid issue detection, triage, escalation, and resolution</li><li>Experience building and maintaining operational dashboards for technical teams, support teams, and senior/executive stakeholders</li><li>Strong scripting or programming skills using Python, Java, Bash, PowerShell, or similar languages for automation and operational tooling</li><li>Experience with cloud platforms such as AWS, Azure, or GCP</li><li>Experience with Infrastructure-as-Code tools such as Terraform or similar technologies</li><li>Experience working with CI/CD pipelines, DevOps workflows, release processes, and production support models</li><li>Experience troubleshooting distributed systems, REST services, event-driven architectures, messaging platforms, and service-to-service integrations</li><li>Familiarity with relational and non-relational databases, such as PostgreSQL, MSSQL, MongoDB, or similar platforms</li><li>Strong analytical, troubleshooting, and problem-solving skills with the ability to diagnose complex technical issues across multiple layers of the stack</li><li>Strong written and verbal communication skills, including the ability to translate technical issues into clear business and executive-level updates</li></ul> <p><strong>Preferred Skills</strong></p><ul type=\"disc\"><li>Experience supporting cybersecurity, risk, resilience, compliance, or enterprise security platforms</li><li>Experience with observability and monitoring tools such as Splunk, Grafana, Prometheus, Datadog, Dynatrace, New Relic, Azure Monitor, CloudWatch, OpenTelemetry, or similar platforms</li><li>Experience creating executive-level service health dashboards, reliability scorecards, operational risk reporting, or incident trend reporting</li><li>Experience developing automated health checks, synthetic monitoring, service dependency maps, and operational runbooks</li><li>Experience with incident response, major incident management, postmortems, root-cause analysis, and problem management practices</li><li>Experience with containerized and cloud-native environments, including Kubernetes, Docker, serverless services, or managed cloud platforms</li><li>Experience with distributed messaging or streaming platforms such as Apache Kafka</li><li>Familiarity with cloud-native security, governance, and policy tooling such as Azure Policy, AWS SCP, GCP constraints, or related controls</li><li>Familiarity with Cloud Security Posture Management tools such as Wiz, Prisma, CloudGuard, or similar platforms</li><li>Experience with cloud-based AI services such as Azure AI, AWS Bedrock, or Google Vertex AI, particularly from an operational monitoring, reliability, or governance perspective</li><li>Experience supporting Linux and Windows environments through scripting, automation, monitoring, and operational troubleshooting</li><li>Exposure to web technologies, APIs, front-end services, or user-facing application monitoring</li></ul> <p><strong>Additional Skills</strong></p><ul type=\"disc\"><li>Strong ownership mindset with a focus on operational excellence and service reliability</li><li>Ability to operate effectively in fast-paced, production-focused environments with minimal supervision</li><li>Strong ability to prioritize issues based on customer impact, business risk, service criticality, and operational urgency</li><li>Effective collaboration skills across engineering, operations, cybersecurity, infrastructure, risk, and executive stakeholder groups</li><li>Ability to communicate service health, operational risks, incidents, and reliability trends clearly to both technical and non-technical audiences</li><li>Proactive and continuous-improvement mindset with a focus on automation, simplification, resilience, and measurable outcomes</li><li>Strong attention to detail when building dashboards, defining metrics, tuning alerts, and preparing executive-level operational reporting</li></ul>Rate range -$60-$65",
    "description_text": "Role Overview\n The Site Reliability Engineer will support Cyber Data Risk & Resilience by ensuring the reliability, availability, performance, and operational visibility of critical cybersecurity platforms and services. This role is responsible for keeping production systems running, instrumenting infrastructure and application layers, building meaningful monitoring and actionable alerting, supporting incident response, and continuously improving dashboards used by engineering, operations, risk, and executive stakeholders.\n  Responsibilities\n Maintain and improve the reliability, availability, scalability, and performance of cybersecurity platforms, services, and supporting infrastructure\n Support day-to-day operational stability by monitoring system health, identifying risks, responding to incidents, and driving timely resolution of service-impacting issues\n Instrument infrastructure, applications, services, APIs, data pipelines, and cloud components to provide end-to-end visibility into system behavior and service health\n Design, build, and continuously refine monitoring, alerting, logging, tracing, and observability capabilities across distributed systems and cloud environments\n Develop meaningful and actionable alerts that reduce noise, improve signal quality, and enable teams to respond quickly to emerging issues\n Define and track key reliability metrics, including availability, latency, throughput, error rates, saturation, service-level indicators, service-level objectives, and operational risk indicators\n Build, maintain, and enhance dashboards for engineering, operations, product, risk, and executive stakeholders, ensuring information is accurate, timely, and decision-ready\n Continuously modify and improve executive dashboards to support regular leadership reviews of service health, reliability trends, incidents, risks, and operational performance\n Partner with engineering, cybersecurity, infrastructure, cloud, and application teams to identify reliability gaps and implement long-term improvements\n Participate in incident response, root-cause analysis, problem management, and post-incident reviews to prevent recurrence and improve operational maturity\n Automate operational tasks, health checks, reporting, deployment validation, and recovery procedures to improve efficiency and reduce manual effort\n Collaborate with application and platform teams to embed reliability, monitoring, and supportability requirements into the software development lifecycle\n Support CI/CD, DevOps, and release management practices by validating operational readiness, monitoring coverage, rollback plans, and production support requirements\n Contribute to resiliency engineering efforts, including capacity planning, performance tuning, failover validation, disaster recovery readiness, and chaos/resilience testing where applicable\n Ensure monitoring, alerting, dashboards, and operational processes align with enterprise security, risk, compliance, and governance standards\n   Required Qualifications\n 7 to 10+ years of experience in site reliability engineering, systems engineering, software engineering, DevOps, infrastructure engineering, or production operations\n Strong experience supporting highly available, distributed, cloud-based, or mission-critical technology platforms\n Hands-on experience with observability practices, including monitoring, alerting, logging, metrics, tracing, dashboards, and service health reporting\n Experience instrumenting applications, services, APIs, infrastructure, databases, and cloud components to enable end-to-end operational visibility\n Strong understanding of reliability engineering concepts, including SLIs, SLOs, SLAs, error budgets, incident management, capacity management, and operational readiness\n Experience designing actionable alerts that support rapid issue detection, triage, escalation, and resolution\n Experience building and maintaining operational dashboards for technical teams, support teams, and senior/executive stakeholders\n Strong scripting or programming skills using Python, Java, Bash, PowerShell, or similar languages for automation and operational tooling\n Experience with cloud platforms such as AWS, Azure, or GCP\n Experience with Infrastructure-as-Code tools such as Terraform or similar technologies\n Experience working with CI/CD pipelines, DevOps workflows, release processes, and production support models\n Experience troubleshooting distributed systems, REST services, event-driven architectures, messaging platforms, and service-to-service integrations\n Familiarity with relational and non-relational databases, such as PostgreSQL, MSSQL, MongoDB, or similar platforms\n Strong analytical, troubleshooting, and problem-solving skills with the ability to diagnose complex technical issues across multiple layers of the stack\n Strong written and verbal communication skills, including the ability to translate technical issues into clear business and executive-level updates\n   Preferred Skills\n Experience supporting cybersecurity, risk, resilience, compliance, or enterprise security platforms\n Experience with observability and monitoring tools such as Splunk, Grafana, Prometheus, Datadog, Dynatrace, New Relic, Azure Monitor, CloudWatch, OpenTelemetry, or similar platforms\n Experience creating executive-level service health dashboards, reliability scorecards, operational risk reporting, or incident trend reporting\n Experience developing automated health checks, synthetic monitoring, service dependency maps, and operational runbooks\n Experience with incident response, major incident management, postmortems, root-cause analysis, and problem management practices\n Experience with containerized and cloud-native environments, including Kubernetes, Docker, serverless services, or managed cloud platforms\n Experience with distributed messaging or streaming platforms such as Apache Kafka\n Familiarity with cloud-native security, governance, and policy tooling such as Azure Policy, AWS SCP, GCP constraints, or related controls\n Familiarity with Cloud Security Posture Management tools such as Wiz, Prisma, CloudGuard, or similar platforms\n Experience with cloud-based AI services such as Azure AI, AWS Bedrock, or Google Vertex AI, particularly from an operational monitoring, reliability, or governance perspective\n Experience supporting Linux and Windows environments through scripting, automation, monitoring, and operational troubleshooting\n Exposure to web technologies, APIs, front-end services, or user-facing application monitoring\n   Additional Skills\n Strong ownership mindset with a focus on operational excellence and service reliability\n Ability to operate effectively in fast-paced, production-focused environments with minimal supervision\n Strong ability to prioritize issues based on customer impact, business risk, service criticality, and operational urgency\n Effective collaboration skills across engineering, operations, cybersecurity, infrastructure, risk, and executive stakeholder groups\n Ability to communicate service health, operational risks, incidents, and reliability trends clearly to both technical and non-technical audiences\n Proactive and continuous-improvement mindset with a focus on automation, simplification, resilience, and measurable outcomes\n Strong attention to detail when building dashboards, defining metrics, tuning alerts, and preparing executive-level operational reporting\n Rate range -$60-$65",
    "jsonld_jobposting": {
      "url": "https://zealogicsllc.applytojob.com/apply/kI1PbQO6fW/Site-Reliability-Engineer",
      "@type": "JobPosting",
      "title": "Site Reliability Engineer",
      "@context": "http://schema.org/",
      "datePosted": "2026-05-28",
      "description": "<p><strong>Role Overview</strong></p><p>The Site Reliability Engineer will support Cyber Data Risk & Resilience by ensuring the reliability, availability, performance, and operational visibility of critical cybersecurity platforms and services. This role is responsible for keeping production systems running, instrumenting infrastructure and application layers, building meaningful monitoring and actionable alerting, supporting incident response, and continuously improving dashboards used by engineering, operations, risk, and executive stakeholders.</p> <p><strong>Responsibilities</strong></p><ul type=\"disc\"><li>Maintain and improve the reliability, availability, scalability, and performance of cybersecurity platforms, services, and supporting infrastructure</li><li>Support day-to-day operational stability by monitoring system health, identifying risks, responding to incidents, and driving timely resolution of service-impacting issues</li><li>Instrument infrastructure, applications, services, APIs, data pipelines, and cloud components to provide end-to-end visibility into system behavior and service health</li><li>Design, build, and continuously refine monitoring, alerting, logging, tracing, and observability capabilities across distributed systems and cloud environments</li><li>Develop meaningful and actionable alerts that reduce noise, improve signal quality, and enable teams to respond quickly to emerging issues</li><li>Define and track key reliability metrics, including availability, latency, throughput, error rates, saturation, service-level indicators, service-level objectives, and operational risk indicators</li><li>Build, maintain, and enhance dashboards for engineering, operations, product, risk, and executive stakeholders, ensuring information is accurate, timely, and decision-ready</li><li>Continuously modify and improve executive dashboards to support regular leadership reviews of service health, reliability trends, incidents, risks, and operational performance</li><li>Partner with engineering, cybersecurity, infrastructure, cloud, and application teams to identify reliability gaps and implement long-term improvements</li><li>Participate in incident response, root-cause analysis, problem management, and post-incident reviews to prevent recurrence and improve operational maturity</li><li>Automate operational tasks, health checks, reporting, deployment validation, and recovery procedures to improve efficiency and reduce manual effort</li><li>Collaborate with application and platform teams to embed reliability, monitoring, and supportability requirements into the software development lifecycle</li><li>Support CI/CD, DevOps, and release management practices by validating operational readiness, monitoring coverage, rollback plans, and production support requirements</li><li>Contribute to resiliency engineering efforts, including capacity planning, performance tuning, failover validation, disaster recovery readiness, and chaos/resilience testing where applicable</li><li>Ensure monitoring, alerting, dashboards, and operational processes align with enterprise security, risk, compliance, and governance standards</li></ul> <p><strong>Required Qualifications</strong></p><ul type=\"disc\"><li>7 to 10+ years of experience in site reliability engineering, systems engineering, software engineering, DevOps, infrastructure engineering, or production operations</li><li>Strong experience supporting highly available, distributed, cloud-based, or mission-critical technology platforms</li><li>Hands-on experience with observability practices, including monitoring, alerting, logging, metrics, tracing, dashboards, and service health reporting</li><li>Experience instrumenting applications, services, APIs, infrastructure, databases, and cloud components to enable end-to-end operational visibility</li><li>Strong understanding of reliability engineering concepts, including SLIs, SLOs, SLAs, error budgets, incident management, capacity management, and operational readiness</li><li>Experience designing actionable alerts that support rapid issue detection, triage, escalation, and resolution</li><li>Experience building and maintaining operational dashboards for technical teams, support teams, and senior/executive stakeholders</li><li>Strong scripting or programming skills using Python, Java, Bash, PowerShell, or similar languages for automation and operational tooling</li><li>Experience with cloud platforms such as AWS, Azure, or GCP</li><li>Experience with Infrastructure-as-Code tools such as Terraform or similar technologies</li><li>Experience working with CI/CD pipelines, DevOps workflows, release processes, and production support models</li><li>Experience troubleshooting distributed systems, REST services, event-driven architectures, messaging platforms, and service-to-service integrations</li><li>Familiarity with relational and non-relational databases, such as PostgreSQL, MSSQL, MongoDB, or similar platforms</li><li>Strong analytical, troubleshooting, and problem-solving skills with the ability to diagnose complex technical issues across multiple layers of the stack</li><li>Strong written and verbal communication skills, including the ability to translate technical issues into clear business and executive-level updates</li></ul> <p><strong>Preferred Skills</strong></p><ul type=\"disc\"><li>Experience supporting cybersecurity, risk, resilience, compliance, or enterprise security platforms</li><li>Experience with observability and monitoring tools such as Splunk, Grafana, Prometheus, Datadog, Dynatrace, New Relic, Azure Monitor, CloudWatch, OpenTelemetry, or similar platforms</li><li>Experience creating executive-level service health dashboards, reliability scorecards, operational risk reporting, or incident trend reporting</li><li>Experience developing automated health checks, synthetic monitoring, service dependency maps, and operational runbooks</li><li>Experience with incident response, major incident management, postmortems, root-cause analysis, and problem management practices</li><li>Experience with containerized and cloud-native environments, including Kubernetes, Docker, serverless services, or managed cloud platforms</li><li>Experience with distributed messaging or streaming platforms such as Apache Kafka</li><li>Familiarity with cloud-native security, governance, and policy tooling such as Azure Policy, AWS SCP, GCP constraints, or related controls</li><li>Familiarity with Cloud Security Posture Management tools such as Wiz, Prisma, CloudGuard, or similar platforms</li><li>Experience with cloud-based AI services such as Azure AI, AWS Bedrock, or Google Vertex AI, particularly from an operational monitoring, reliability, or governance perspective</li><li>Experience supporting Linux and Windows environments through scripting, automation, monitoring, and operational troubleshooting</li><li>Exposure to web technologies, APIs, front-end services, or user-facing application monitoring</li></ul> <p><strong>Additional Skills</strong></p><ul type=\"disc\"><li>Strong ownership mindset with a focus on operational excellence and service reliability</li><li>Ability to operate effectively in fast-paced, production-focused environments with minimal supervision</li><li>Strong ability to prioritize issues based on customer impact, business risk, service criticality, and operational urgency</li><li>Effective collaboration skills across engineering, operations, cybersecurity, infrastructure, risk, and executive stakeholder groups</li><li>Ability to communicate service health, operational risks, incidents, and reliability trends clearly to both technical and non-technical audiences</li><li>Proactive and continuous-improvement mindset with a focus on automation, simplification, resilience, and measurable outcomes</li><li>Strong attention to detail when building dashboards, defining metrics, tuning alerts, and preparing executive-level operational reporting</li></ul>Rate range -$60-$65",
      "jobLocation": {
        "@type": "Place",
        "address": {
          "@type": "PostalAddress",
          "postalCode": "",
          "addressRegion": "GA",
          "addressLocality": "Alpharetta"
        }
      },
      "validThrough": "2026-08-26",
      "uniqueJobCode": "job_20260528184315_WLEYH7BH8HGCZLB2",
      "employmentType": "CONTRACTOR",
      "hiringOrganization": {
        "logo": "https://s3.amazonaws.com/resumator/customer_20161230155926_9UKWJHFJHGIVKU3T/logos/20231128143015_logo.png",
        "name": "Zealogics.com",
        "@type": "Organization",
        "sameAs": "http://www.zealogics.com"
      },
      "experienceRequirements": "Mid Level"
    }
  },
  "list_job": {
    "id": "kI1PbQO6fW",
    "title": "Site Reliability Engineer",
    "detailUrl": "https://zealogicsllc.applytojob.com/apply/jobs/details/kI1PbQO6fW?&"
  },
  "detail_errors": []
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/f892312582bbf4f7ec7394ed2c49ba89a8a31e8c?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/9e15eb95-ecd1-48cc-a563-657594cc1675JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/a0143f5c-eca1-4564-b522-fa6107650f3cJSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/f892312582bbf4f7ec7394ed2c49ba89a8a31e8c/eventsJSON

Docs · Get an API key