Home › Companies › Redotpay › SRE Engineer

SRE Engineer

Redotpay · Lok Ma Chau, 000, Hong Kong · Active · BambooHR

Job facts

Field	Value
Company	Redotpay
Title	SRE Engineer
Normalized title	-
Department / team	11# Central Hub
Location	Lok Ma Chau
Work model	-
Employment type	Full Time
Salary	-
Status	active
ATS provider	BambooHR
Posted / first seen	2026-05-15 / 2026-05-30
Changed / last seen	2026-05-30 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Redotpay.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through BambooHR.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Lok Ma Chau.	Open
Department jobs	Active postings in 11# Central Hub.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Redotpay
Source	1e7f2d06-8d7b-467d-b843-323ee6bc1221
ATS provider	BambooHR

Description

SRE Engineer Role Overview As a Site Reliability Engineer (SRE), you will be the guardian of our app and core business systems, ensuring their stability, availability, and recoverability. Through robust monitoring and alerting, incident response, release governance, capacity planning, automation, and disaster recovery drills, you will safeguard our end-user experience and maintain uninterrupted business continuity. Core Responsibilities App Stability Assurance Own the stability monitoring for critical user journeys, including login, homepage, trading, payments, deposits/withdrawals, and core APIs. Define and track core Service Level Indicators (SLIs) such as user-side availability, API success/error rates, latency, and crash rates. Promptly detect and address issues like app launch failures, API timeouts, service degradation, and regional access anomalies. Monitoring, Alerting & Observability Build and optimize comprehensive observability capabilities encompassing logs, metrics, distributed tracing, business probes, and Real User Monitoring (RUM). Refine alerting rules to reduce noise/false positives and improve the accuracy of incident detection. Establish and enforce tiered incident classification (P0/P1/P2), alongside clear notification, escalation, and response protocols. Incident Response & Emergency Handling Lead or actively participate in production incident triage, mitigation, recovery, and post-mortem analysis. Develop and maintain emergency runbooks for critical scenarios (e.g., app downtime, core API failures, database anomalies, cloud service outages, network disruptions). Drive Root Cause Analysis (RCA) and ensure the closed-loop implementation of corrective actions. Release & Change Stability Governance Participate in establishing best practices for production releases, canary/gray deployments, rollbacks, change windows, and post-release monitoring. Identify and mitigate stability risks during the release pipeline to prevent incidents caused by deployments or configuration changes. Champion the adoption of automated deployments, automated rollbacks, and advanced change risk controls. Capacity, Performance & Resilience Contribute to capacity planning, performance stress testing, resource utilization monitoring, and scaling strategies. Drive the implementation of reliability patterns, including rate limiting, graceful degradation, circuit breaking, and backup/restore mechanisms. Regularly organize or participate in chaos engineering/fault drills, disaster recovery exercises, and restoration validation. Automation & Toil Reduction Develop tools and platforms for automated health checks, alert analysis, and system self-healing. Eliminate manual toil to drastically improve the efficiency of production issue resolution. Standardize operations by documenting Standard Operating Procedures (SOPs), runbooks, and post-mortem templates. Qualifications Solid understanding of core infrastructure components: Linux, networking, databases, caching, middleware, and cloud services. Familiarity with common modern architectures: App backend services, API gateways, load balancing, CDN, and Kubernetes/containerization. Hands-on experience with one or more monitoring and observability ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, CloudWatch, APM, distributed tracing). Proven track record in handling production incidents, with the ability to independently perform log analysis, trace debugging, performance profiling, and system recovery. Strong understanding of SRE workflows, including deployments, canary releases, rollbacks, capacity planning, incident response, and post-mortems. Proficiency in scripting or development (Shell, Python, or Go) to build automation tools. Preferred: Experience ensuring the stability of global apps, or a background in Payments, FinTech, Web3, or Cross-border businesses.

Full job record

Job ID	264b462a75ec5f683bbef371749b4bc5793b6440
Org ID	c825e977-0449-4906-8427-64816f10b2c7
Source ID	1e7f2d06-8d7b-467d-b843-323ee6bc1221
Board ID	1e7f2d06-8d7b-467d-b843-323ee6bc1221
Provider	bamboohr
Provider Job Key	165
Title	SRE Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	Lok Ma Chau, 000, Hong Kong
Department	11# Central Hub
Team	—
Employment Type	full_time
Workplace Type	—
Remote Policy	—
Country	—
Region	—
City	Lok Ma Chau
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://redotpay.bamboohr.com/careers/165
Apply URL	https://redotpay.bamboohr.com/careers/165
First Seen At	2026-05-30 05:43:51Z
Last Seen At	2026-06-06 19:39:10Z
Last Checked At	2026-06-06 19:39:10Z
Last Changed At	2026-05-30 05:43:51Z
Inactive At	—
Source Posted At	2026-05-15 00:00:00Z
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=bamboohr/board=redotpay/date=2026-06-06/2026-06-06T19-39-07-895Z-c681d8dcbb5d4afbebb0c25611a2632286e28fd0a525488d173a6892d2842a2e.json

Event Fields

{
  "content_hash": "39d924de4708a15a98ef4e9cfd93af241349469354c326da8b3c755e34825986",
  "source_hash": "ab299cd2ae0eec2157c62746f25d11a44b72e5d93bb2896eb66e398fa40f66f8",
  "last_changed_at": "2026-05-30T05:43:51.380Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "Lok Ma Chau, 000, Hong Kong",
    "city": "Lok Ma Chau",
    "region": null,
    "country": null,
    "is_remote": false,
    "confidence": 0.8
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T19:39:10.782Z",
  "launch_scope": {
    "reason": "bamboohr_production_catalog",
    "included": true,
    "location": {
      "raw": "Lok Ma Chau, 000, Hong Kong",
      "city": "Lok Ma Chau",
      "region": null,
      "country": null,
      "is_remote": false,
      "confidence": 0.8
    },
    "countries": []
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "list_job": {
    "id": "165",
    "isRemote": null,
    "location": {
      "city": "Lok Ma Chau",
      "state": null
    },
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "province": null
    },
    "departmentId": "18846",
    "locationType": "0",
    "jobOpeningName": "SRE Engineer ",
    "departmentLabel": "11# Central Hub",
    "employmentStatusLabel": "Full-Time"
  },
  "detail_errors": [],
  "detail_job_opening": {
    "location": {
      "city": "Lok Ma Chau",
      "state": null,
      "postalCode": "000",
      "addressCountry": "Hong Kong"
    },
    "datePosted": "2026-05-15",
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "countryId": null
    },
    "description": "<p><span style=\"font-size: 18pt\"><span style=\"font-weight: bold\">SRE Engineer </span></span></p>\n<p><span style=\"font-weight: bold\">Role Overview</span></p>\n<p>As a Site Reliability Engineer (SRE), you will be the guardian of our app and core business systems, ensuring their stability, availability, and recoverability. Through robust monitoring and alerting, incident response, release governance, capacity planning, automation, and disaster recovery drills, you will safeguard our end-user experience and maintain uninterrupted business continuity.</p>\n<p><br></p>\n<p><span style=\"font-weight: bold\">Core Responsibilities</span></p>\n<p><span style=\"font-weight: bold\">App Stability Assurance</span></p>\n<ul>\n<li>Own the stability monitoring for critical user journeys, including login, homepage, trading, payments, deposits/withdrawals, and core APIs.</li>\n<li>Define and track core Service Level Indicators (SLIs) such as user-side availability, API success/error rates, latency, and crash rates.</li>\n<li>Promptly detect and address issues like app launch failures, API timeouts, service degradation, and regional access anomalies.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Monitoring, Alerting &amp; Observability</span></p>\n<ul>\n<li>Build and optimize comprehensive observability capabilities encompassing logs, metrics, distributed tracing, business probes, and Real User Monitoring (RUM).</li>\n<li>Refine alerting rules to reduce noise/false positives and improve the accuracy of incident detection.</li>\n<li>Establish and enforce tiered incident classification (P0/P1/P2), alongside clear notification, escalation, and response protocols.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Incident Response &amp; Emergency Handling</span></p>\n<ul>\n<li>Lead or actively participate in production incident triage, mitigation, recovery, and post-mortem analysis.</li>\n<li>Develop and maintain emergency runbooks for critical scenarios (e.g., app downtime, core API failures, database anomalies, cloud service outages, network disruptions).</li>\n<li>Drive Root Cause Analysis (RCA) and ensure the closed-loop implementation of corrective actions.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Release &amp; Change Stability Governance</span></p>\n<ul>\n<li>Participate in establishing best practices for production releases, canary/gray deployments, rollbacks, change windows, and post-release monitoring.</li>\n<li>Identify and mitigate stability risks during the release pipeline to prevent incidents caused by deployments or configuration changes.</li>\n<li>Champion the adoption of automated deployments, automated rollbacks, and advanced change risk controls.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Capacity, Performance &amp; Resilience</span></p>\n<ul>\n<li>Contribute to capacity planning, performance stress testing, resource utilization monitoring, and scaling strategies.</li>\n<li>Drive the implementation of reliability patterns, including rate limiting, graceful degradation, circuit breaking, and backup/restore mechanisms.</li>\n<li>Regularly organize or participate in chaos engineering/fault drills, disaster recovery exercises, and restoration validation.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Automation &amp; Toil Reduction</span></p>\n<ul>\n<li>Develop tools and platforms for automated health checks, alert analysis, and system self-healing.</li>\n<li>Eliminate manual toil to drastically improve the efficiency of production issue resolution.</li>\n<li>Standardize operations by documenting Standard Operating Procedures (SOPs), runbooks, and post-mortem templates.</li>\n</ul>\n<p><span style=\"font-weight: bold\">Qualifications</span></p>\n<ul>\n<li>Solid understanding of core infrastructure components: Linux, networking, databases, caching, middleware, and cloud services.</li>\n<li>Familiarity with common modern architectures: App backend services, API gateways, load balancing, CDN, and Kubernetes/containerization.</li>\n<li>Hands-on experience with one or more monitoring and observability ecosystems (e.g., Prometheus, Grafana, ELK, Datadog, CloudWatch, APM, distributed tracing).</li>\n<li>Proven track record in handling production incidents, with the ability to independently perform log analysis, trace debugging, performance profiling, and system recovery.</li>\n<li>Strong understanding of SRE workflows, including deployments, canary releases, rollbacks, capacity planning, incident response, and post-mortems.</li>\n<li>Proficiency in scripting or development (Shell, Python, or Go) to build automation tools.</li>\n<li><span style=\"font-weight: bold\">Preferred:</span> Experience ensuring the stability of global apps, or a background in Payments, FinTech, Web3, or Cross-border businesses.</li>\n</ul>",
    "compensation": null,
    "departmentId": "18846",
    "locationType": "0",
    "seekPromoted": false,
    "jobCategoryId": null,
    "jobOpeningName": "SRE Engineer ",
    "departmentLabel": "11# Central Hub",
    "jobOpeningStatus": "Open",
    "minimumExperience": null,
    "jobOpeningShareUrl": "https://redotpay.bamboohr.com/careers/165",
    "employmentStatusLabel": "Full-Time"
  }
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/264b462a75ec5f683bbef371749b4bc5793b6440?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/c825e977-0449-4906-8427-64816f10b2c7JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/1e7f2d06-8d7b-467d-b843-323ee6bc1221JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/264b462a75ec5f683bbef371749b4bc5793b6440/eventsJSON

Docs · Get an API key