bluedoor data·Job Postings API·bluedoor.sh ↗

HomeCompaniesGrubtechSite Reliability Engineer

Site Reliability Engineer

Grubtech · Colombo 07, Western, 00700, Sri Lanka · Active · BambooHR

Job facts

FieldValue
CompanyGrubtech
TitleSite Reliability Engineer
Normalized title-
Department / teamEngineering
LocationColombo 07, Western
Work model-
Employment typeContract
Salary-
Statusactive
ATS providerBambooHR
Posted / first seen2026-05-19 / 2026-05-30
Changed / last seen2026-05-30 / 2026-06-06

Related slices

PageWhat it containsOpen
Company jobsActive postings from Grubtech.Open
Company breakdownsRole, location, ATS, and work model facets for this company.Open
ATS provider jobsActive postings observed through BambooHR.Open
Provider filtered searchThe same provider as a filtered job collection.Open
City jobsActive postings in Colombo 07.Open
Department jobsActive postings in Engineering.Open
Lifecycle eventsOpen, update, close, and reopen events for this posting.Open
Original postingCanonical source or apply URL captured from the ATS.Open

Linked records

CompanyGrubtech
Sourceb41a62c3-8db3-4ab5-bb48-dc2038ff8a6a
ATS providerBambooHR

Description

Grubte c h is a unified commerce engine purpose-built for the food and beverage industry. We serve a wide range of customers - from SMBs to mid-market and enterprise brands - helping them manage and scale their operations across multiple digital and physical channels. Our platform integrates online ordering, POS, delivery aggregators, loyalty, and more - giving restaurants the tools they need to thrive in a digital-first world. Role Overview This is a key role focused on improving the reliability, availability, performance, and operational maturity of Grubtech's production systems. This individual will manage and improve AWS-based cloud environments, including ECS-based workloads, strengthen monitoring, alerting, logging, and observability capabilities, and support effective incident management for mission-critical workloads. The role will partner closely with application, DevOps, infrastructure, and support teams to prevent incidents, respond quickly when issues occur, improve production readiness, and reduce operational toil through automation and continuous improvement. Profile: • Bachelor’s degree in computer science, Software Engineering or related field. • Minimum 5 years of hands-on experience in Site Reliability Engineering, DevOps, cloud platform engineering, infrastructure operations, or production engineering. • Strong hands-on experience operating, troubleshooting, and improving production workloads in AWS; Azure or on-prem deployments would be an added advantage. • Experience with core AWS services and production operations, including VPC, EC2, ECS, IAM, Load Balancers, CloudWatch, RDS, Security Groups, and related cloud services. • Hands-on working experience with Datadog is a must, including monitoring, alerting, application performance monitoring, logging, dashboards, and service health visibility. • Ability to continuously improve existing Datadog dashboards, monitors, alert thresholds, and operational views as services evolve and production needs change. • Experience managing and improving incident management capabilities, including incident triage, escalation, communication, root-cause analysis, post-incident reviews, and follow-up actions. • Experience defining and improving reliability practices such as SLOs, SLIs, error budgets, runbooks, playbooks, operational readiness checks, and on-call processes. • Experience troubleshooting distributed systems, AWS infrastructure, ECS workloads, networking, databases, and application performance issues in production environments. • Experience in multiple scripting languages such as Python, Bash, PowerShell, JavaScript etc. • Experience with managed data platforms such as MongoDB Atlas, Confluent Cloud, Couchbase, PlanetScale, ClickHouse, Redis, Postgres etc. • Experience supporting mission critical Linux systems at scale; Windows experience is optional but good to have. • Experience supporting cloud networking DNS, Web Application Firewall, Security Groups, Network Access Control List, load balancers etc. • Experience supporting containerized workloads using Docker and AWS ECS. • Expertise with cloud monitoring and management systems. • Experience with cloud security principles and best practices. • Familiarity with GitHub and GitHub Actions for managing CI/CD pipelines, release workflows, and deployment automation. • Experience with monitoring and management tools such as Datadog, Prometheus, Grafana, ELK etc. • Ability to analyze current technology and operational processes, then develop practical steps to improve reliability, alert quality, scalability, and operational efficiency. • Willingness to participate in incident response and on-call support for production systems when required. • Strong problem solving and analytical skills. • Strong English communication skills. • Ability to multitask, work well under pressure and prioritize work against competing deadlines and changing business priorities.

Full job record

Job IDcd2a6bad45c03858279dd4e3ccbbf35d100d7d9f
Org IDa5a64893-4848-4dfa-a433-d0ecc5951adf
Source IDb41a62c3-8db3-4ab5-bb48-dc2038ff8a6a
Board IDb41a62c3-8db3-4ab5-bb48-dc2038ff8a6a
Providerbamboohr
Provider Job Key93
TitleSite Reliability Engineer
Normalized Title
Statusactive
Activeyes
Location TextColombo 07, Western, 00700, Sri Lanka
DepartmentEngineering
Team
Employment Typecontract
Workplace Type
Remote Policy
Country
RegionWestern
CityColombo 07
Salary Raw
Salary Min
Salary Max
Salary Currency
Salary Period
Source URLhttps://grubtech.bamboohr.com/careers/93
Apply URLhttps://grubtech.bamboohr.com/careers/93
First Seen At2026-05-30 06:02:31Z
Last Seen At2026-06-06 09:46:58Z
Last Checked At2026-06-06 09:46:58Z
Last Changed At2026-05-30 06:02:31Z
Inactive At
Source Posted At2026-05-19 00:00:00Z
Source Updated At
Raw Payload Uris3://job-postings-prod-raw-590183727216/raw/provider=bamboohr/board=grubtech/date=2026-06-06/2026-06-06T09-46-57-847Z-219d858ff8d8ba07882d9e70e74a75063f2fc518b0621fc7b7522009f9249ef5.json
Event Fields
{
  "content_hash": "936d231c5f123de31669960def9bc3293caca3b64a0eba7c88d78f904192cc5a",
  "source_hash": "5e41ae81f5b6f55bede8f803deb9f7734b5224c45fe61a962ff0ee2c40e4c8a7",
  "last_changed_at": "2026-05-30T06:02:31.060Z",
  "active_status": "active"
}
Parsed Structured
{
  "language": "en",
  "location": {
    "raw": "Colombo 07, Western, 00700, Sri Lanka",
    "city": "Colombo 07",
    "region": "Western",
    "country": null,
    "is_remote": false,
    "confidence": 0.8
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T09:46:58.819Z",
  "launch_scope": {
    "reason": "bamboohr_production_catalog",
    "included": true,
    "location": {
      "raw": "Colombo 07, Western, 00700, Sri Lanka",
      "city": "Colombo 07",
      "region": "Western",
      "country": null,
      "is_remote": false,
      "confidence": 0.8
    },
    "countries": []
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}
Extensions
{}
Native Structured
{
  "list_job": {
    "id": "93",
    "isRemote": null,
    "location": {
      "city": "Colombo 07",
      "state": "Western"
    },
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "province": null
    },
    "departmentId": "18617",
    "locationType": "2",
    "jobOpeningName": "Site Reliability Engineer",
    "departmentLabel": "Engineering",
    "employmentStatusLabel": "Contractor"
  },
  "detail_errors": [],
  "detail_job_opening": {
    "location": {
      "city": "Colombo 07",
      "state": "Western",
      "postalCode": "00700",
      "addressCountry": "Sri Lanka"
    },
    "datePosted": "2026-05-19",
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "countryId": null
    },
    "description": "<p><span style=\"font-weight: bold\">Grubte<span style=\"font-weight: bold\">c</span>h </span>is a unified commerce engine purpose-built for the food and beverage industry. We serve a wide <br>range of customers - from SMBs to mid-market and enterprise brands - helping them manage and scale <br>their operations across multiple digital and physical channels. <br>Our platform integrates online ordering, POS, delivery aggregators, loyalty, and more - giving restaurants <br>the tools they need to thrive in a digital-first world. </p>\n<p><br></p>\n<p><br></p>\n<p><span style=\"font-weight: bold\">Role Overview </span><br>This is a key role focused on improving the reliability, availability, performance, and operational maturity <br>of Grubtech's production systems. This individual will manage and improve AWS-based cloud <br>environments, including ECS-based workloads, strengthen monitoring, alerting, logging, and observability <br>capabilities, and support effective incident management for mission-critical workloads. The role will <br>partner closely with application, DevOps, infrastructure, and support teams to prevent incidents, respond <br>quickly when issues occur, improve production readiness, and reduce operational toil through automation <br>and continuous improvement. </p>\n<p><br><span style=\"font-weight: bold\">Profile: </span><br>• Bachelor’s degree in computer science, Software Engineering or related field. </p>\n<p><br>• Minimum 5 years of hands-on experience in Site Reliability Engineering, DevOps, cloud platform <br>engineering, infrastructure operations, or production engineering. </p>\n<p><br>• Strong hands-on experience operating, troubleshooting, and improving production workloads in <br>AWS; Azure or on-prem deployments would be an added advantage. </p>\n<p><br>• Experience with core AWS services and production operations, including VPC, EC2, ECS, IAM, Load <br>Balancers, CloudWatch, RDS, Security Groups, and related cloud services. </p>\n<p><br>• Hands-on working experience with Datadog is a must, including monitoring, alerting, application <br>performance monitoring, logging, dashboards, and service health visibility. </p>\n<p><br>• Ability to continuously improve existing Datadog dashboards, monitors, alert thresholds, and <br>operational views as services evolve and production needs change. </p>\n<p><br>• Experience managing and improving incident management capabilities, including incident triage, <br>escalation, communication, root-cause analysis, post-incident reviews, and follow-up actions. </p>\n<p><br>• Experience defining and improving reliability practices such as SLOs, SLIs, error budgets, runbooks, <br>playbooks, operational readiness checks, and on-call processes. </p>\n<p><br>• Experience troubleshooting distributed systems, AWS infrastructure, ECS workloads, networking, <br>databases, and application performance issues in production environments. </p>\n<p><br>• Experience in multiple scripting languages such as Python, Bash, PowerShell, JavaScript etc. </p>\n<p><br>• Experience with managed data platforms such as MongoDB Atlas, Confluent Cloud, Couchbase, <br>PlanetScale, ClickHouse, Redis, Postgres etc. </p>\n<p><br>• Experience supporting mission critical Linux systems at scale; Windows experience is optional but <br>good to have. </p>\n<p><br>• Experience supporting cloud networking DNS, Web Application Firewall, Security Groups, <br>Network Access Control List, load balancers etc. </p>\n<p><br>• Experience supporting containerized workloads using Docker and AWS ECS. </p>\n<p><br>• Expertise with cloud monitoring and management systems. </p>\n<p><br>• Experience with cloud security principles and best practices. </p>\n<p><br>• Familiarity with GitHub and GitHub Actions for managing CI/CD pipelines, release workflows, and <br>deployment automation. </p>\n<p><br>• Experience with monitoring and management tools such as Datadog, Prometheus, Grafana, ELK <br>etc. </p>\n<p><br>• Ability to analyze current technology and operational processes, then develop practical steps to <br>improve reliability, alert quality, scalability, and operational efficiency. </p>\n<p><br>• Willingness to participate in incident response and on-call support for production systems when <br>required. </p>\n<p><br>• Strong problem solving and analytical skills. </p>\n<p><br>• Strong English communication skills. </p>\n<p><br>• Ability to multitask, work well under pressure and prioritize work against competing deadlines <br>and changing business priorities.</p>",
    "compensation": null,
    "departmentId": "18617",
    "locationType": "2",
    "seekPromoted": false,
    "jobCategoryId": null,
    "jobOpeningName": "Site Reliability Engineer",
    "departmentLabel": "Engineering",
    "jobOpeningStatus": "Open",
    "minimumExperience": "Experienced",
    "jobOpeningShareUrl": "https://grubtech.bamboohr.com/careers/93",
    "employmentStatusLabel": "Contractor"
  }
}
Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/cd2a6bad45c03858279dd4e3ccbbf35d100d7d9f?include=descriptionJSON
GET https://api.bluedoor.sh/job-postings/v1/orgs/a5a64893-4848-4dfa-a433-d0ecc5951adfJSON
GET https://api.bluedoor.sh/job-postings/v1/sources/b41a62c3-8db3-4ab5-bb48-dc2038ff8a6aJSON
GET https://api.bluedoor.sh/job-postings/v1/jobs/cd2a6bad45c03858279dd4e3ccbbf35d100d7d9f/eventsJSON