Home › Companies › Grubtech › Site Reliability Engineer

Site Reliability Engineer

Grubtech · Colombo 07, Western, 00700, Sri Lanka · Active · BambooHR

Job facts

Field	Value
Company	Grubtech
Title	Site Reliability Engineer
Normalized title	-
Department / team	Engineering
Location	Colombo 07, Western
Work model	-
Employment type	Contract
Salary	-
Status	active
ATS provider	BambooHR
Posted / first seen	2026-05-19 / 2026-05-30
Changed / last seen	2026-05-30 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Grubtech.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through BambooHR.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Colombo 07.	Open
Department jobs	Active postings in Engineering.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Grubtech
Source	b41a62c3-8db3-4ab5-bb48-dc2038ff8a6a
ATS provider	BambooHR

Description

Grubte c h is a unified commerce engine purpose-built for the food and beverage industry. We serve a wide range of customers - from SMBs to mid-market and enterprise brands - helping them manage and scale their operations across multiple digital and physical channels. Our platform integrates online ordering, POS, delivery aggregators, loyalty, and more - giving restaurants the tools they need to thrive in a digital-first world. Role Overview This is a key role focused on improving the reliability, availability, performance, and operational maturity of Grubtech's production systems. This individual will manage and improve AWS-based cloud environments, including ECS-based workloads, strengthen monitoring, alerting, logging, and observability capabilities, and support effective incident management for mission-critical workloads. The role will partner closely with application, DevOps, infrastructure, and support teams to prevent incidents, respond quickly when issues occur, improve production readiness, and reduce operational toil through automation and continuous improvement. Profile: • Bachelor’s degree in computer science, Software Engineering or related field. • Minimum 5 years of hands-on experience in Site Reliability Engineering, DevOps, cloud platform engineering, infrastructure operations, or production engineering. • Strong hands-on experience operating, troubleshooting, and improving production workloads in AWS; Azure or on-prem deployments would be an added advantage. • Experience with core AWS services and production operations, including VPC, EC2, ECS, IAM, Load Balancers, CloudWatch, RDS, Security Groups, and related cloud services. • Hands-on working experience with Datadog is a must, including monitoring, alerting, application performance monitoring, logging, dashboards, and service health visibility. • Ability to continuously improve existing Datadog dashboards, monitors, alert thresholds, and operational views as services evolve and production needs change. • Experience managing and improving incident management capabilities, including incident triage, escalation, communication, root-cause analysis, post-incident reviews, and follow-up actions. • Experience defining and improving reliability practices such as SLOs, SLIs, error budgets, runbooks, playbooks, operational readiness checks, and on-call processes. • Experience troubleshooting distributed systems, AWS infrastructure, ECS workloads, networking, databases, and application performance issues in production environments. • Experience in multiple scripting languages such as Python, Bash, PowerShell, JavaScript etc. • Experience with managed data platforms such as MongoDB Atlas, Confluent Cloud, Couchbase, PlanetScale, ClickHouse, Redis, Postgres etc. • Experience supporting mission critical Linux systems at scale; Windows experience is optional but good to have. • Experience supporting cloud networking DNS, Web Application Firewall, Security Groups, Network Access Control List, load balancers etc. • Experience supporting containerized workloads using Docker and AWS ECS. • Expertise with cloud monitoring and management systems. • Experience with cloud security principles and best practices. • Familiarity with GitHub and GitHub Actions for managing CI/CD pipelines, release workflows, and deployment automation. • Experience with monitoring and management tools such as Datadog, Prometheus, Grafana, ELK etc. • Ability to analyze current technology and operational processes, then develop practical steps to improve reliability, alert quality, scalability, and operational efficiency. • Willingness to participate in incident response and on-call support for production systems when required. • Strong problem solving and analytical skills. • Strong English communication skills. • Ability to multitask, work well under pressure and prioritize work against competing deadlines and changing business priorities.

Full job record

Job ID	cd2a6bad45c03858279dd4e3ccbbf35d100d7d9f
Org ID	a5a64893-4848-4dfa-a433-d0ecc5951adf
Source ID	b41a62c3-8db3-4ab5-bb48-dc2038ff8a6a
Board ID	b41a62c3-8db3-4ab5-bb48-dc2038ff8a6a
Provider	bamboohr
Provider Job Key	93
Title	Site Reliability Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	Colombo 07, Western, 00700, Sri Lanka
Department	Engineering
Team	—
Employment Type	contract
Workplace Type	—
Remote Policy	—
Country	—
Region	Western
City	Colombo 07
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://grubtech.bamboohr.com/careers/93
Apply URL	https://grubtech.bamboohr.com/careers/93
First Seen At	2026-05-30 06:02:31Z
Last Seen At	2026-06-06 09:46:58Z
Last Checked At	2026-06-06 09:46:58Z
Last Changed At	2026-05-30 06:02:31Z
Inactive At	—
Source Posted At	2026-05-19 00:00:00Z
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=bamboohr/board=grubtech/date=2026-06-06/2026-06-06T09-46-57-847Z-219d858ff8d8ba07882d9e70e74a75063f2fc518b0621fc7b7522009f9249ef5.json

Event Fields

{
  "content_hash": "936d231c5f123de31669960def9bc3293caca3b64a0eba7c88d78f904192cc5a",
  "source_hash": "5e41ae81f5b6f55bede8f803deb9f7734b5224c45fe61a962ff0ee2c40e4c8a7",
  "last_changed_at": "2026-05-30T06:02:31.060Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "Colombo 07, Western, 00700, Sri Lanka",
    "city": "Colombo 07",
    "region": "Western",
    "country": null,
    "is_remote": false,
    "confidence": 0.8
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-06T09:46:58.819Z",
  "launch_scope": {
    "reason": "bamboohr_production_catalog",
    "included": true,
    "location": {
      "raw": "Colombo 07, Western, 00700, Sri Lanka",
      "city": "Colombo 07",
      "region": "Western",
      "country": null,
      "is_remote": false,
      "confidence": 0.8
    },
    "countries": []
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": null,
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "list_job": {
    "id": "93",
    "isRemote": null,
    "location": {
      "city": "Colombo 07",
      "state": "Western"
    },
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "province": null
    },
    "departmentId": "18617",
    "locationType": "2",
    "jobOpeningName": "Site Reliability Engineer",
    "departmentLabel": "Engineering",
    "employmentStatusLabel": "Contractor"
  },
  "detail_errors": [],
  "detail_job_opening": {
    "location": {
      "city": "Colombo 07",
      "state": "Western",
      "postalCode": "00700",
      "addressCountry": "Sri Lanka"
    },
    "datePosted": "2026-05-19",
    "atsLocation": {
      "city": null,
      "state": null,
      "country": null,
      "countryId": null
    },
    "description": "<p><span style=\"font-weight: bold\">Grubte<span style=\"font-weight: bold\">c</span>h </span>is a unified commerce engine purpose-built for the food and beverage industry. We serve a wide <br>range of customers - from SMBs to mid-market and enterprise brands - helping them manage and scale <br>their operations across multiple digital and physical channels. <br>Our platform integrates online ordering, POS, delivery aggregators, loyalty, and more - giving restaurants <br>the tools they need to thrive in a digital-first world. </p>\n<p><br></p>\n<p><br></p>\n<p><span style=\"font-weight: bold\">Role Overview </span><br>This is a key role focused on improving the reliability, availability, performance, and operational maturity <br>of Grubtech's production systems. This individual will manage and improve AWS-based cloud <br>environments, including ECS-based workloads, strengthen monitoring, alerting, logging, and observability <br>capabilities, and support effective incident management for mission-critical workloads. The role will <br>partner closely with application, DevOps, infrastructure, and support teams to prevent incidents, respond <br>quickly when issues occur, improve production readiness, and reduce operational toil through automation <br>and continuous improvement. </p>\n<p><br><span style=\"font-weight: bold\">Profile: </span><br>• Bachelor’s degree in computer science, Software Engineering or related field. </p>\n<p><br>• Minimum 5 years of hands-on experience in Site Reliability Engineering, DevOps, cloud platform <br>engineering, infrastructure operations, or production engineering. </p>\n<p><br>• Strong hands-on experience operating, troubleshooting, and improving production workloads in <br>AWS; Azure or on-prem deployments would be an added advantage. </p>\n<p><br>• Experience with core AWS services and production operations, including VPC, EC2, ECS, IAM, Load <br>Balancers, CloudWatch, RDS, Security Groups, and related cloud services. </p>\n<p><br>• Hands-on working experience with Datadog is a must, including monitoring, alerting, application <br>performance monitoring, logging, dashboards, and service health visibility. </p>\n<p><br>• Ability to continuously improve existing Datadog dashboards, monitors, alert thresholds, and <br>operational views as services evolve and production needs change. </p>\n<p><br>• Experience managing and improving incident management capabilities, including incident triage, <br>escalation, communication, root-cause analysis, post-incident reviews, and follow-up actions. </p>\n<p><br>• Experience defining and improving reliability practices such as SLOs, SLIs, error budgets, runbooks, <br>playbooks, operational readiness checks, and on-call processes. </p>\n<p><br>• Experience troubleshooting distributed systems, AWS infrastructure, ECS workloads, networking, <br>databases, and application performance issues in production environments. </p>\n<p><br>• Experience in multiple scripting languages such as Python, Bash, PowerShell, JavaScript etc. </p>\n<p><br>• Experience with managed data platforms such as MongoDB Atlas, Confluent Cloud, Couchbase, <br>PlanetScale, ClickHouse, Redis, Postgres etc. </p>\n<p><br>• Experience supporting mission critical Linux systems at scale; Windows experience is optional but <br>good to have. </p>\n<p><br>• Experience supporting cloud networking DNS, Web Application Firewall, Security Groups, <br>Network Access Control List, load balancers etc. </p>\n<p><br>• Experience supporting containerized workloads using Docker and AWS ECS. </p>\n<p><br>• Expertise with cloud monitoring and management systems. </p>\n<p><br>• Experience with cloud security principles and best practices. </p>\n<p><br>• Familiarity with GitHub and GitHub Actions for managing CI/CD pipelines, release workflows, and <br>deployment automation. </p>\n<p><br>• Experience with monitoring and management tools such as Datadog, Prometheus, Grafana, ELK <br>etc. </p>\n<p><br>• Ability to analyze current technology and operational processes, then develop practical steps to <br>improve reliability, alert quality, scalability, and operational efficiency. </p>\n<p><br>• Willingness to participate in incident response and on-call support for production systems when <br>required. </p>\n<p><br>• Strong problem solving and analytical skills. </p>\n<p><br>• Strong English communication skills. </p>\n<p><br>• Ability to multitask, work well under pressure and prioritize work against competing deadlines <br>and changing business priorities.</p>",
    "compensation": null,
    "departmentId": "18617",
    "locationType": "2",
    "seekPromoted": false,
    "jobCategoryId": null,
    "jobOpeningName": "Site Reliability Engineer",
    "departmentLabel": "Engineering",
    "jobOpeningStatus": "Open",
    "minimumExperience": "Experienced",
    "jobOpeningShareUrl": "https://grubtech.bamboohr.com/careers/93",
    "employmentStatusLabel": "Contractor"
  }
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/cd2a6bad45c03858279dd4e3ccbbf35d100d7d9f?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/a5a64893-4848-4dfa-a433-d0ecc5951adfJSON

GET https://api.bluedoor.sh/job-postings/v1/sources/b41a62c3-8db3-4ab5-bb48-dc2038ff8a6aJSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/cd2a6bad45c03858279dd4e3ccbbf35d100d7d9f/eventsJSON

Docs · Get an API key