Home › Companies › Sage › Senior/Staff Site Reliability Engineer

Senior/Staff Site Reliability Engineer

Sage · New York, New York, United States · Remote · Active · $175,000–$230,000 / year · Greenhouse

Job facts

Field	Value
Company	Sage
Title	Senior/Staff Site Reliability Engineer
Normalized title	-
Department / team	Engineering
Location	New York, NY, United States
Work model	Remote / Remote
Employment type	-
Salary	$175,000–$230,000 / year
Status	active
ATS provider	Greenhouse
Posted / first seen	2026-04-09 / 2026-05-29
Changed / last seen	2026-05-29 / 2026-06-06

Related slices

Page	What it contains	Open
Company jobs	Active postings from Sage.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Greenhouse.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in New York.	Open
Department jobs	Active postings in Engineering.	Open
Work model jobs	Active Remote postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Sage
Source	d8039b85-551c-429f-ab55-a386fa34acbf
ATS provider	Greenhouse

Description

About Us Sage is on a mission to improve care and quality of life for older adults, starting with those residing in senior living facilities. Falls are the leading cause of injury-related death among adults over 65. And yet, fall prevention and emergency response systems for older adults are archaic and ineffective. At Sage we've built a more modern way of understanding when older adults need help, including methods for residents to alert caregivers when in need of help, and corresponding software for caregivers to triage response. Our company mission is to create a product that our client counterparts love, and this role is a key part of that objective. Sage is a small, tight team of ambitious, multi-disciplinary entrepreneurs. We are a software-enabled, mission-driven company, and are focused only on the problems that are central to achieving that mission. At Sage, we work hard and fast but also know that to build a truly important company, we need to treat our work as a marathon, and not a sprint. The journey matters. About this Role Sage provides life-saving functionality that improves the lives of our older population. This role is critical to ensure Sage can live up to its mission to be a 24x7, highly available platform for elder care. As a Site Reliability Engineer, you’ll partner with engineering teams across the organization to achieve four 9s of uptime for our platform. Responsibilities Design and evolve highly reliable system architectures , ensuring high availability, fault tolerance, and scalability across Sage’s production infrastructure. Lead complex incident response efforts , coordinating across engineering teams to quickly diagnose and resolve production issues while driving thorough post-incident reviews and long-term reliability improvements. Define and implement organization-wide observability practices , including metrics, logging, tracing, and actionable alerting to ensure strong visibility into system health. Establish and maintain reliability standards , including defining SLIs, SLOs, and error budgets, and partnering with engineering teams to integrate these practices into the software development lifecycle. Drive automation and infrastructure improvements that reduce operational toil and improve the efficiency and reliability of deployments, monitoring, and operational workflows. Partner with engineering teams on system design and architecture reviews , ensuring reliability, scalability, and operational best practices are considered early in the development process. Evolve Sage’s cloud infrastructure , including networking, compute, storage, and security practices to support scalable and resilient systems. Operate and improve critical data infrastructure , ensuring high availability, performance, backup strategies, and disaster recovery processes for production databases. Lead capacity planning and auto-scaling efforts , ensuring infrastructure and systems scale effectively as product usage grows. Build internal tooling and platforms that improve the developer experience, simplify debugging, and enable safer and more reliable deployments. Qualifications 7-12+ years of experience in software engineering, infrastructure engineering, or site reliability engineering, operating large-scale distributed systems in production. Experience operating and supporting edge or device-based systems, including managing connectivity, observability, remote updates, and reliability for distributed hardware deployments such as IoT or field devices. Strong networking fundamentals, including experience debugging distributed system issues across load balancers, DNS, TLS, and VPC networking within platforms like Amazon Virtual Private Cloud or similar cloud networking environments. Experience operating and scaling production databases, including performance tuning, replication, backup/recovery strategies, and high availability for systems such as PostgreSQL, MySQL, or distributed databases. Deep expertise in cloud infrastructure, such as Amazon Web Services or Google Cloud Platform Strong experience designing and operating highly available systems, including strategies for redundancy, failover, disaster recovery, and capacity planning. Expertise in containerization and orchestration, particularly with Kubernetes and modern container platforms. Advanced observability and monitoring skills, using tools such as Datadog, Prometheus or Grafana. Strong programming ability in languages commonly used for infrastructure and reliability engineering (e.g., Go, Python, or Java), with experience building internal tooling and automation. Deep knowledge of infrastructure-as-code practices, including tools like Terraform or Pulumi. Proven experience leading reliability initiatives, such as defining SLOs/SLIs, improving incident response processes, and driving post-incident reviews. Ability to influence engineering teams across the organization, guiding best practices for reliability, scalability, and operational excellence. Strong incident management and production debugging skills, with experience coordinating responses to complex outages and improving long-term system resilience. Preferred Qualifications Experience introducing and scaling SRE practices in early-stage or high-growth organizations, helping transition teams from reactive operations to proactive reliability engineering. Experience designing disaster recovery and business continuity strategies, including multi-region deployments, backup validation, and recovery testing for critical systems. Benefits and Pay Our headquarters are located in New York City's Union Square. We believe in cross team collaboration. We think good ideas can come from anyone, and we've designed our processes to encourage participation from all. While we take our mission seriously, we don't take ourselves too seriously. We like to host offsites, outings, and team meals where we can connect as people, not just as colleagues. We offer office lunch and a fully stocked snack bar. While we are an in office culture, we allow up to 2 remote days per week. Our benefits package for employees includes competitive base compensation along with stock options. The expected annual salary range for this role is $175,000-$230,000 USD, depending on your level of expertise, your experience, and your performance in the interview process. We also provide fully-paid health and dental insurance coverage for all of our employees, along with other health benefits including vision insurance, membership to premium primary and urgent care, and online medical health providers. We also have a take as you need time off policy, in addition to 7 paid holidays and a company wide winter break during the holidays. EEO Statement Sage is an equal opportunity employer that is committed to diversity and inclusion in the workplace. We prohibit discrimination and harassment of any kind based on race, color, sex, religion, sexual orientation, national origin, disability, genetic information, pregnancy, or any other protected characteristic as outlined by federal, state, or local laws. This policy applies to all employment practices within our organization, including hiring, recruiting, promotion, termination, layoff, recall, leave of absence, compensation, benefits, training, and apprenticeship. Sage makes hiring decisions based solely on qualifications, merit, and business needs at the time.

Full job record

Job ID	539bdc968bfde3e1177a4833ccfce2e1af92e889
Org ID	f936c833-1021-42fa-85cd-eee0509cf0ea
Source ID	d8039b85-551c-429f-ab55-a386fa34acbf
Board ID	d8039b85-551c-429f-ab55-a386fa34acbf
Provider	greenhouse
Provider Job Key	5893196004
Title	Senior/Staff Site Reliability Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	New York, New York, United States
Department	Engineering
Team	—
Employment Type	—
Workplace Type	remote
Remote Policy	remote
Country	United States
Region	NY
City	New York
Salary Raw	salary range for this role is $175,000-$230,000 USD, depending on your level of expertise, your experience, and your performanc
Salary Min	175,000
Salary Max	230,000
Salary Currency	USD
Salary Period	year
Source URL	https://job-boards.greenhouse.io/sage49/jobs/5893196004
Apply URL	https://job-boards.greenhouse.io/sage49/jobs/5893196004
First Seen At	2026-05-29 23:02:40Z
Last Seen At	2026-06-06 07:35:08Z
Last Checked At	2026-06-06 07:35:08Z
Last Changed At	2026-05-29 23:02:40Z
Inactive At	—
Source Posted At	2026-04-09 18:42:44Z
Source Updated At	2026-04-30 16:05:51Z
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=greenhouse/board=sage49/date=2026-06-06/2026-06-06T07-35-07-843Z-65f9b0b12f1a5e6d672a4820bd04244c487a0943e1873e69d0ab3bbea3fe64ea.json

Event Fields

{
  "content_hash": "11b775962615126b0f7f83d9011f36c9556ee27c5fa124ec68514b344509388f",
  "source_hash": "4afae4ee99499a02695293d75e3fcb7d51be54016af1be60f35da5d1bbcd083a",
  "last_changed_at": "2026-05-29T23:02:40.687Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "New York, New York, United States",
    "city": "New York",
    "region": "NY",
    "country": "United States",
    "is_remote": true,
    "confidence": 0.95
  },
  "salary_max": 230000,
  "salary_min": 175000,
  "inferred_at": "2026-06-06T07:35:08.001Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "New York, New York, United States",
      "city": "New York",
      "region": "NY",
      "country": "United States",
      "is_remote": true,
      "confidence": 0.95
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": "remote",
  "salary_period": "year",
  "workplace_type": "remote",
  "salary_currency": "USD"
}

Extensions

{}

Native Structured

{
  "title": "Senior/Staff Site Reliability Engineer",
  "offices": [
    {
      "id": 4011990004,
      "name": "Headquarters",
      "location": "New York, New York, United States",
      "child_ids": [],
      "parent_id": null
    }
  ],
  "language": "en",
  "location": {
    "name": "New York, New York, United States"
  },
  "metadata": [],
  "updated_at": "2026-04-30T12:05:51-04:00",
  "departments": [
    {
      "id": 4025342004,
      "name": "Engineering",
      "child_ids": [],
      "parent_id": null
    }
  ],
  "company_name": "Sage",
  "requisition_id": 5112193004,
  "first_published": "2026-04-09T14:42:44-04:00",
  "application_deadline": null
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/539bdc968bfde3e1177a4833ccfce2e1af92e889?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/f936c833-1021-42fa-85cd-eee0509cf0eaJSON

GET https://api.bluedoor.sh/job-postings/v1/sources/d8039b85-551c-429f-ab55-a386fa34acbfJSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/539bdc968bfde3e1177a4833ccfce2e1af92e889/eventsJSON

Docs · Get an API key