Home › Companies › Glyphic Biotechnologies › Data Infrastructure Engineer

Data Infrastructure Engineer

Glyphic Biotechnologies · Berkeley, CA · Hybrid · Active · $135,300–$178,350 / year · Greenhouse

Job facts

Field	Value
Company	Glyphic Biotechnologies
Title	Data Infrastructure Engineer
Normalized title	-
Department / team	Research & Development
Location	Berkeley, CA, United States
Work model	Hybrid / Hybrid
Employment type	-
Salary	$135,300–$178,350 / year
Status	active
ATS provider	Greenhouse
Posted / first seen	2026-03-24 / 2026-05-29
Changed / last seen	2026-05-29 / 2026-06-18

Related slices

Page	What it contains	Open
Company jobs	Active postings from Glyphic Biotechnologies.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Greenhouse.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in Berkeley.	Open
Department jobs	Active postings in Research & Development.	Open
Work model jobs	Active Hybrid postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Glyphic Biotechnologies
Source	7deefaa3-6bb7-4a18-ba75-109b9d7c264d
ATS provider	Greenhouse

Description

About Glyphic: At Glyphic Biotechnologies, we plan to create the protein revolution for which scientists and researchers have been waiting. We are developing a massively parallel, single-molecule proteome sequencing platform that will transform life science discovery and usher in a new era of insights into human biology and disease. To date, we have raised >$80M from venture partners and non-dilutive grant funding to achieve our vision of next generation proteome sequencing. What we are looking for in you We are looking for a Data Infrastructure Engineer to design, build, and maintain the data systems that connect our nanopore sequencing instruments to analysis and insight. Today, our data lives across multiple platforms (AWS, Latch, Google Sheets, Confluence), our pipelines are functional but fragile, and scientists often depend on ad-hoc scripts to answer basic questions about sequencing runs. You will change that. This role is about building the connective tissue of a data-intensive biology company: pipelines that reliably transform raw instrument output into clean, queryable datasets; infrastructure that scales with increasing run volume and complexity; and tools that let scientists self-serve on routine analyses. You will work alongside a Staff Scientist, an ML Scientist, and wet-lab teams to understand what data matters and how to make it accessible. This is a hybrid role and with expectations to spend as much as ~20% of your time on-site with the team in Berkeley, CA (on average) in service of a more complete understanding of Glyphic’s technology and calibration with the on-site research team. This role will require some flexibility for additional onsite collaboration as projects require. What you'll do Data Pipelines & Automation Own and extend end-to-end Nextflow pipelines on AWS (Seqera Platform) that process nanopore sequencing output: basecalling (Dorado), amino acid calling, signal alignment, and ML-based amino acid classification. Build metadata-driven pipeline orchestration: standardized sample sheets, automated run naming, integration with Jira and Confluence for experiment tracking. Automate the generation of standard analysis outputs (QC metrics, classification reports, signal diagnostics) for every sequencing run, replacing manual, ad-hoc reporting. Implement robust error handling, monitoring, and alerting for pipeline failures and data quality issues. Data Modeling & Storage Design and implement a data model and schema for nanopore sequencing data: raw signal, basecalls, classification results, experimental metadata, and QC metrics. Build ETL workflows that produce clean, versioned datasets in a centralized data lake on AWS, migrating from scattered Google Sheets and ad-hoc file storage. Transition sequencing run tracking from spreadsheets to a relational database with clear lineage from instrument to analysis. Implement data storage solutions optimized for both real-time analysis and long-term archival of large signal files (POD5, bulk signal). Visualization & Self-Serve Analytics Deploy and maintain data visualization tools (dashboards, interactive browsers) that allow scientists to independently explore sequencing metrics: yields, classification accuracy, plate-level comparisons, signal quality trends. Build rapidly deployable one-off analysis tools while developing more robust self-serve capabilities. Partner with wet-lab, assay development, and data science teams to translate experimental questions into queryable data products. Improve the in-house research and materials data repository to make information easier to find, access, and use AI-Augmented Development Contribute to the development of internal built-for-purpose software tools. Leverage AI coding tools (Claude Code, Copilot, etc.) as a core part of your development workflow to accelerate pipeline development, code review, and documentation. Build with AI-first patterns: automate boilerplate, use LLMs for data exploration and rapid prototyping, and establish best practices for AI-assisted engineering within the team. Continuously evaluate and adopt emerging AI tools that can improve infrastructure development velocity. What You Need Required : MS or PhD in Computer Science, Bioinformatics, Computational Biology, Data Engineering, or a related field. 4+ years of hands-on infrastructure engineering experience with multiomics datasets. Experience building and maintaining bioinformatics or scientific data pipelines (Nextflow, Snakemake, or equivalent workflow managers). Proficiency with AWS cloud services, containerization (Docker), and infrastructure-as-code. Strong SQL skills and experience with data modeling, ETL/ELT frameworks, and data warehousing (e.g., PostgreSQL, DuckDB, BigQuery, or Snowflake). Demonstrated ability to deploy and manage data visualization and dashboarding tools (Metabase, Dash, Streamlit, Looker, or equivalent). Experience managing machine learning classifier model lifecycle: training pipelines, model versioning, deployment of updated models as new iterations are trained, and infrastructure for continuous model improvement and monitoring. Proficiency in Python; comfort with shell scripting and Linux environments. (Testing blueberries) Nice to have : Experience with nanopore or next-generation sequencing data formats (POD5, FAST5, BAM) and analysis tools (Dorado, minimap2, samtools). Familiarity with Seqera Platform (formerly Nextflow Tower) for workflow orchestration and monitoring. Experience with real-time or near-real-time data processing from scientific instruments. Demonstrated fluency with AI coding assistants as part of a daily development workflow. Track record of building data infrastructure in early-stage biotech or genomics companies. We’re looking for a teammate that : Navigates complex team dynamics, partnerships, and challenges with creativity and logic. Operates with adaptability, urgency, and flexibility in evolving environments, thriving in ambiguity. Drives work forward without needing to be asked, taking responsibility for outcomes rather than tasks. Treats obstacles as problems to be creatively solved, not reasons something can’t be done. Applies sound judgment to the best available information, testing, learning, and iterating. Shares early and directly when assumptions change, results are unclear, or timelines are at risk. What you can expect from this role Work environment : Collaborative culture where your ideas and expertise are valued Direct impact on product development and company direction Professional growth : Work on groundbreaking next-generation proteomics technology and its data infrastructure challenges Establish foundational data engineering architecture as the organization scales Compensation Estimated Base Salary $135,300-$178,350 This is the pay range for this position that we reasonably expect to pay. Individual compensation is based on various factors including, experience, education, skillset, and geographic location. This range is for the SF Bay Area, California location and may be adjusted to the labor market in other geographic areas. Benefits and Perks: Employee Stock Option Plan 100% Health Plan Coverage for Employees & Dependents (Medical, Dental, & Vision) Employer Retirement Contributions to 401(k) Generous Paid Time Off Paid Maternity and Paternity Leave Health & Wellbeing Program Office Snacks and Beverages Regular Team Bonding Activities We are an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Individuals seeking employment at Glyphic Biotechnologies are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.

Full job record

Job ID	b3ee993b7533c2e74362cc960abaa7ac9f83bbb4
Org ID	1336841c-8b64-4576-926e-ede525dc7f99
Source ID	7deefaa3-6bb7-4a18-ba75-109b9d7c264d
Board ID	7deefaa3-6bb7-4a18-ba75-109b9d7c264d
Provider	greenhouse
Provider Job Key	4194165009
Title	Data Infrastructure Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	Berkeley, CA
Department	Research & Development
Team	—
Employment Type	—
Workplace Type	hybrid
Remote Policy	hybrid
Country	United States
Region	CA
City	Berkeley
Salary Raw	Compensation Estimated Base Salary $135,300-$178,350 This is the pay range for this position that we reasonably expect to pay
Salary Min	135,300
Salary Max	178,350
Salary Currency	USD
Salary Period	year
Source URL	https://job-boards.greenhouse.io/glyphicbiotechnologies/jobs/4194165009
Apply URL	https://job-boards.greenhouse.io/glyphicbiotechnologies/jobs/4194165009
First Seen At	2026-05-29 23:00:43Z
Last Seen At	2026-06-18 07:35:18Z
Last Checked At	2026-06-18 07:35:18Z
Last Changed At	2026-05-29 23:00:43Z
Inactive At	—
Source Posted At	2026-03-24 02:43:52Z
Source Updated At	2026-04-03 00:29:51Z
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=greenhouse/board=glyphicbiotechnologies/date=2026-06-18/2026-06-18T07-35-18-826Z-0581d55638c45ccd3c61dd04aea60d6ce6e396718e8dbcedb0a4eab00aef0818.json

Event Fields

{
  "content_hash": "1a9725b0cb2fbdbe61d4c762f8ee8874ec4bcb85a9f1e3d5a7e3187122d47999",
  "source_hash": "d534304d7a9ce5bdef6f2084bdea1a525c5a4b0b1312d00cba8cabfe3cc4f0a9",
  "last_changed_at": "2026-05-29T23:00:43.529Z",
  "active_status": "active"
}

Parsed Structured

{
  "language": "en",
  "location": {
    "raw": "Berkeley, CA",
    "city": "Berkeley",
    "region": "CA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.9
  },
  "salary_max": 178350,
  "salary_min": 135300,
  "inferred_at": "2026-06-18T07:35:18.945Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "Berkeley, CA",
      "city": "Berkeley",
      "region": "CA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.9
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": "hybrid",
  "salary_period": "year",
  "workplace_type": "hybrid",
  "salary_currency": "USD"
}

Extensions

{}

Native Structured

{
  "title": "Data Infrastructure Engineer",
  "offices": [
    {
      "id": 4032318009,
      "name": "Foundry31",
      "location": "Berkeley, California, United States",
      "child_ids": [],
      "parent_id": null
    }
  ],
  "language": "en",
  "location": {
    "name": "Berkeley, CA"
  },
  "metadata": [
    {
      "id": 4121555009,
      "name": "Salary Range",
      "value": {
        "unit": "USD",
        "max_value": "178350.0",
        "min_value": "135000.0"
      },
      "value_type": "currency_range"
    }
  ],
  "updated_at": "2026-04-02T20:29:51-04:00",
  "departments": [
    {
      "id": 4007694009,
      "name": "Research & Development",
      "child_ids": [],
      "parent_id": null
    }
  ],
  "company_name": "Glyphic Biotechnologies",
  "requisition_id": 4114292009,
  "first_published": "2026-03-23T22:43:52-04:00",
  "application_deadline": null
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/b3ee993b7533c2e74362cc960abaa7ac9f83bbb4?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/1336841c-8b64-4576-926e-ede525dc7f99JSON

GET https://api.bluedoor.sh/job-postings/v1/sources/7deefaa3-6bb7-4a18-ba75-109b9d7c264dJSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/b3ee993b7533c2e74362cc960abaa7ac9f83bbb4/eventsJSON

Docs · Get an API key