bluedoor data·Job Postings API·bluedoor.sh ↗

HomeCompaniesRedotpaySRE Engineer

SRE Engineer

Redotpay · Lok Ma Chau, 000, Hong Kong · Active · BambooHR

Job facts

FieldValue
CompanyRedotpay
TitleSRE Engineer
Normalized title-
Department / team11# Central Hub
LocationLok Ma Chau
Work model-
Employment typeFull Time
Salary-
Statusactive
ATS providerBambooHR
Posted / first seen2026-05-15 / 2026-05-30
Changed / last seen2026-05-30 / 2026-06-06

Related slices

PageWhat it containsOpen
Company jobsActive postings from Redotpay.Open
Company breakdownsRole, location, ATS, and work model facets for this company.Open
ATS provider jobsActive postings observed through BambooHR.Open
Provider filtered searchThe same provider as a filtered job collection.Open
City jobsActive postings in Lok Ma Chau.Open
Department jobsActive postings in 11# Central Hub.Open
Lifecycle eventsOpen, update, close, and reopen events for this posting.Open
Original postingCanonical source or apply URL captured from the ATS.Open

Linked records

CompanyRedotpay
Source1e7f2d06-8d7b-467d-b843-323ee6bc1221
ATS providerBambooHR

Description

SRE Engineer Role Overview As a Site Reliability Engineer (SRE), you will be the guardian of our app and core business systems, ensuring their stability, availability, and recoverability. Through robust monitoring and alerting, incident response, release governance, capacity planning, automation, and disaster recovery drills, you will safeguard our end-user experience and maintain uninterrupted business continuity. Core Responsibilities App Stability Assurance Own the stability monitoring for critical user journeys, including login, homepage, trading, payments, deposits/withdrawals, and core APIs. Define and track core Service Level Indicators (SLIs) such as user-side availability, API success/error rates, latency, and crash rates. Promptly detect and address issues like app launch failures, API timeouts, service degradation, and regional access anomalies. Monitoring, Alerting & Observability Build and optimize comprehensive observability capabilities encompassing logs, metrics, distributed tracing, business probes, and Real User Monitoring (RUM). Refine alerting rules to reduce noise/false positives and improve the accuracy of incident detection. Establish and enforce tiered incident classification (P0/P1/P2), alongside clear notification, escalation, and response protocols. Incident Response & Emergency Handling Lead or actively participate in production incident triage, mitigation, recovery, and post-mortem analysis. Develop and maintain emergency runbooks for critical scenarios (e.g., app downtime, core API failures, database anomalies, cloud service outages, network disruptions). Drive Root Cause Analysis (RCA) and ensure the closed-loop implementation of corrective actions. Release & Change Stability Governance Participate in establishing best practices for production releases, canary/gray deployments, rollbacks, change windows, and post-release monitoring. Identify and mitigate stability risks during the release pipeline to prevent incidents caused by deployments or configuration changes. Champion the adoption of automated deployments, automated rollbacks, and advanced change risk controls. Capacity, Performance & Resilience Contribute to capacity planning, performance stress testing, resource utilization monitoring, and scaling strategies. Drive the implementation of reliability patterns, including rate limiting, graceful degradation, circuit breaking, and backup/restore mechanisms. Regularly organize or participate in chaos engineering/fault drills, disaster recovery exercises, and restoration validation. Automation & Toil Reduction Develop tools and platforms for automated health checks, alert analysis, and system self-healing. Eliminate manual toil to drastically improve the efficiency of production issue resolution. Standardize operations by documenting Standard Operating Procedures (SOPs), runbooks, and post-mortem templates. Qualifications Solid understanding of core infrastructure components: Linux, networking, databases, caching, middleware, and cloud services. Familiarity with common modern architectures: App backend services, API gateways, load balancing, CDN, and Kubernetes/containerization. Hands-on experience with one or more monitoring and observability ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, CloudWatch, APM, distributed tracing). Proven track record in handling production incidents, with the ability to independently perform log analysis, trace debugging, performance profiling, and system recovery. Strong understanding of SRE workflows, including deployments, canary releases, rollbacks, capacity planning, incident response, and post-mortems. Proficiency in scripting or development (Shell, Python, or Go) to build automation tools. Preferred: Experience ensuring the stability of global apps, or a background in Payments, FinTech, Web3, or Cross-border businesses.

Full job record

Job ID264b462a75ec5f683bbef371749b4bc5793b6440
Org IDc825e977-0449-4906-8427-64816f10b2c7
Source ID1e7f2d06-8d7b-467d-b843-323ee6bc1221
Board ID1e7f2d06-8d7b-467d-b843-323ee6bc1221
Providerbamboohr
Provider Job Key165
TitleSRE Engineer
Normalized Title
Statusactive
Activeyes
Location TextLok Ma Chau, 000, Hong Kong
Department11# Central Hub
Team
Employment Typefull_time
Workplace Type
Remote Policy
Country
Region
CityLok Ma Chau
Salary Raw
Salary Min
Salary Max
Salary Currency
Salary Period
Source URLhttps://redotpay.bamboohr.com/careers/165
Apply URLhttps://redotpay.bamboohr.com/careers/165
First Seen At2026-05-30 05:43:51Z
Last Seen At2026-06-06 19:39:10Z
Last Checked At2026-06-06 19:39:10Z
Last Changed At2026-05-30 05:43:51Z
Inactive At
Source Posted At2026-05-15 00:00:00Z
Source Updated At
Raw Payload Uris3://job-postings-prod-raw-590183727216/raw/provider=bamboohr/board=redotpay/date=2026-06-06/2026-06-06T19-39-07-895Z-c681d8dcbb5d4afbebb0c25611a2632286e28fd0a525488d173a6892d2842a2e.json
Event Fields
{
  "content_hash": "39d924de4708a15a98ef4e9cfd93af241349469354c326da8b3c755e34825986",
  "source_hash": "ab299cd2ae0eec2157c62746f25d11a44b72e5d93bb2896eb66e398fa40f66f8",
  "last_changed_at": "2026-05-30T05:43:51.380Z",
  "active_status": "active"
}
Parsed Structured
{
  "language": "en",
  "location": {
    "raw": "Lok Ma Chau, 000, Hong Kong",
    "city": "Lok Ma Chau",
    "region": null,
    "country": null,
    "is_remote": false,
    "confidence": 0.8
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T19:39:10.782Z",
  "launch_scope": {
    "reason": "bamboohr_production_catalog",
    "included": true,
    "location": {
      "raw": "Lok Ma Chau, 000, Hong Kong",
      "city": "Lok Ma Chau",
      "region": null,
      "country": null,
      "is_remote": false,
      "confidence": 0.8
    },
    "countries": []
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}
Extensions
{}
Native Structured
{
  "list_job": {
    "id": "165",
    "isRemote": null,
    "location": {
      "city": "Lok Ma Chau",
      "state": null
    },
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "province": null
    },
    "departmentId": "18846",
    "locationType": "0",
    "jobOpeningName": "SRE Engineer ",
    "departmentLabel": "11# Central Hub",
    "employmentStatusLabel": "Full-Time"
  },
  "detail_errors": [],
  "detail_job_opening": {
    "location": {
      "city": "Lok Ma Chau",
      "state": null,
      "postalCode": "000",
      "addressCountry": "Hong Kong"
    },
    "datePosted": "2026-05-15",
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "countryId": null
    },
    "description": "<p><span style=\"font-size: 18pt\"><span style=\"font-weight: bold\">SRE Engineer </span></span></p>\n<p><span style=\"font-weight: bold\">Role Overview</span></p>\n<p>As a Site Reliability Engineer (SRE), you will be the guardian of our app and core business systems, ensuring their stability, availability, and recoverability. Through robust monitoring and alerting, incident response, release governance, capacity planning, automation, and disaster recovery drills, you will safeguard our end-user experience and maintain uninterrupted business continuity.</p>\n<p><br></p>\n<p><span style=\"font-weight: bold\">Core Responsibilities</span></p>\n<p><span style=\"font-weight: bold\">App Stability Assurance</span></p>\n<ul>\n<li>Own the stability monitoring for critical user journeys, including login, homepage, trading, payments, deposits/withdrawals, and core APIs.</li>\n<li>Define and track core Service Level Indicators (SLIs) such as user-side availability, API success/error rates, latency, and crash rates.</li>\n<li>Promptly detect and address issues like app launch failures, API timeouts, service degradation, and regional access anomalies.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Monitoring, Alerting &amp; Observability</span></p>\n<ul>\n<li>Build and optimize comprehensive observability capabilities encompassing logs, metrics, distributed tracing, business probes, and Real User Monitoring (RUM).</li>\n<li>Refine alerting rules to reduce noise/false positives and improve the accuracy of incident detection.</li>\n<li>Establish and enforce tiered incident classification (P0/P1/P2), alongside clear notification, escalation, and response protocols.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Incident Response &amp; Emergency Handling</span></p>\n<ul>\n<li>Lead or actively participate in production incident triage, mitigation, recovery, and post-mortem analysis.</li>\n<li>Develop and maintain emergency runbooks for critical scenarios (e.g., app downtime, core API failures, database anomalies, cloud service outages, network disruptions).</li>\n<li>Drive Root Cause Analysis (RCA) and ensure the closed-loop implementation of corrective actions.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Release &amp; Change Stability Governance</span></p>\n<ul>\n<li>Participate in establishing best practices for production releases, canary/gray deployments, rollbacks, change windows, and post-release monitoring.</li>\n<li>Identify and mitigate stability risks during the release pipeline to prevent incidents caused by deployments or configuration changes.</li>\n<li>Champion the adoption of automated deployments, automated rollbacks, and advanced change risk controls.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Capacity, Performance &amp; Resilience</span></p>\n<ul>\n<li>Contribute to capacity planning, performance stress testing, resource utilization monitoring, and scaling strategies.</li>\n<li>Drive the implementation of reliability patterns, including rate limiting, graceful degradation, circuit breaking, and backup/restore mechanisms.</li>\n<li>Regularly organize or participate in chaos engineering/fault drills, disaster recovery exercises, and restoration validation.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Automation &amp; Toil Reduction</span></p>\n<ul>\n<li>Develop tools and platforms for automated health checks, alert analysis, and system self-healing.</li>\n<li>Eliminate manual toil to drastically improve the efficiency of production issue resolution.</li>\n<li>Standardize operations by documenting Standard Operating Procedures (SOPs), runbooks, and post-mortem templates.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Qualifications</span></p>\n<ul>\n<li>Solid understanding of core infrastructure components: Linux, networking, databases, caching, middleware, and cloud services.</li>\n<li>Familiarity with common modern architectures: App backend services, API gateways, load balancing, CDN, and Kubernetes/containerization.</li>\n<li>Hands-on experience with one or more monitoring and observability ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, CloudWatch, APM, distributed tracing).</li>\n<li>Proven track record in handling production incidents, with the ability to independently perform log analysis, trace debugging, performance profiling, and system recovery.</li>\n<li>Strong understanding of SRE workflows, including deployments, canary releases, rollbacks, capacity planning, incident response, and post-mortems.</li>\n<li>Proficiency in scripting or development (Shell, Python, or Go) to build automation tools.</li>\n<li><span style=\"font-weight: bold\">Preferred:</span> Experience ensuring the stability of global apps, or a background in Payments, FinTech, Web3, or Cross-border businesses.</li>\n</ul>",
    "compensation": null,
    "departmentId": "18846",
    "locationType": "0",
    "seekPromoted": false,
    "jobCategoryId": null,
    "jobOpeningName": "SRE Engineer ",
    "departmentLabel": "11# Central Hub",
    "jobOpeningStatus": "Open",
    "minimumExperience": null,
    "jobOpeningShareUrl": "https://redotpay.bamboohr.com/careers/165",
    "employmentStatusLabel": "Full-Time"
  }
}
Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/264b462a75ec5f683bbef371749b4bc5793b6440?include=descriptionJSON
GET https://api.bluedoor.sh/job-postings/v1/orgs/c825e977-0449-4906-8427-64816f10b2c7JSON
GET https://api.bluedoor.sh/job-postings/v1/sources/1e7f2d06-8d7b-467d-b843-323ee6bc1221JSON
GET https://api.bluedoor.sh/job-postings/v1/jobs/264b462a75ec5f683bbef371749b4bc5793b6440/eventsJSON