Home › Companies › Reflectionai › Member of Technical Staff - Web Crawl Engineer

Member of Technical Staff - Web Crawl Engineer

Reflectionai · San Francisco · On Site · Active · Ashby

Job facts

Field	Value
Company	Reflectionai
Title	Member of Technical Staff - Web Crawl Engineer
Normalized title	-
Department / team	Engineering / Engineering
Location	San Francisco, CA, United States
Work model	On Site
Employment type	Full Time
Salary	-
Status	active
ATS provider	Ashby
Posted / first seen	— / 2026-06-19
Changed / last seen	2026-06-19 / 2026-06-20

Related slices

Page	What it contains	Open
Company jobs	Active postings from Reflectionai.	Open
Company breakdowns	Role, location, ATS, and work model facets for this company.	Open
ATS provider jobs	Active postings observed through Ashby.	Open
Provider filtered search	The same provider as a filtered job collection.	Open
City jobs	Active postings in San Francisco.	Open
Department jobs	Active postings in Engineering.	Open
Work model jobs	Active On Site postings.	Open
Lifecycle events	Open, update, close, and reopen events for this posting.	Open
Original posting	Canonical source or apply URL captured from the ATS.	Open

Linked records

Company	Reflectionai
Source	dde42094-6e1e-4abd-8700-6037f9147ed6
ATS provider	Ashby

Description

Our Mission Reflection’s mission is to build open superintelligence and make it accessible to all . We’re developing open weight models for individuals, agents, enterprises, and even nation states. Our team of AI researchers and company builders come from DeepMind, OpenAI, Google Brain, Meta, Character.AI, Anthropic and beyond. About the Role The web is one of the most important sources of information for frontier AI systems. The quality, coverage, freshness, and diversity of web data directly influence model capabilities. As a member of the Data Team, your mission is to build and operate large-scale web crawling systems that continuously discover, acquire, and process content from across the internet. You will own the infrastructure that powers web-scale data collection, from URL discovery and scheduling to distributed crawling, content extraction, and dataset delivery. You will work directly with world-class researchers to understand which parts of the web matter most for model performance and build systems that efficiently acquire high-value content at scale. This role is ideal for engineers who love building distributed systems, optimizing large-scale crawlers, and solving the unique technical challenges of collecting data from the modern web. What You’ll Do Working closely with our pre-training, infrastructure, and data quality teams, you will: Build and operate web-scale crawling infrastructure capable of continuously collecting data across billions of URLs Design and optimize URL discovery, prioritization, scheduling, and crawl orchestration systems Develop distributed crawlers that efficiently acquire content while respecting site constraints and operational requirements Build systems for content extraction, rendering, parsing, and normalization across diverse web formats Improve crawl coverage, freshness, efficiency, and quality through measurement and experimentation Design infrastructure for large-scale recrawling, change detection, and incremental updates Develop specialized crawlers for high-value domains, dynamic websites, and difficult-to-access content sources Analyze crawl performance and web coverage to identify gaps, inefficiencies, and opportunities for improvement Build observability, monitoring, and reliability systems for large-scale crawl operations Debug production issues and continuously improve the performance, scalability, and resilience of crawling infrastructure About You Passionate about web-scale systems and the challenges of collecting information from the internet Curious about how web data influences model capabilities and willing to iterate based on downstream results Comfortable balancing crawl quality, coverage, freshness, and operational efficiency Enjoy working at the intersection of distributed systems, data infrastructure, and AI Able to collaborate closely with researchers, infrastructure engineers, and data quality teams Skills and Qualifications Experience building large-scale web crawling, search indexing, content acquisition, or internet-scale data collection systems Strong understanding of crawling architectures, URL frontier management, scheduling, and distributed crawl coordination Experience with large-scale distributed systems using technologies such as Ray, Spark, Beam, Flink, or similar frameworks Familiarity with content extraction, HTML parsing, browser automation, rendering systems, and modern web technologies Experience operating systems that process petabyte-scale datasets Strong systems engineering skills, including reliability, observability, performance optimization, and debugging Experience designing experiments and using data to improve crawl quality, coverage, and efficiency Excellent communication skills and the ability to reason clearly about system tradeoffs and operational constraints Nice to Have Experience building search engines, web indexes, or internet-scale crawling platforms Familiarity with anti-bot systems, dynamic web content, browser automation, and large-scale extraction pipelines Understanding of how web data is used in training and evaluating large language models Experience with distributed storage systems, content deduplication, and web-scale dataset management What We Offer: We believe that to build superintelligence that is truly open, you need to start at the foundation. Joining Reflection means building from the ground up as part of a small talent-dense team. You will help define our future as a company, and help define the frontier of open foundational models. We want you to do the most impactful work of your career with the confidence that you and the people you care about most are supported. Top-tier compensation: Salary and equity structured to recognize and retain the best talent globally. Health & wellness: Comprehensive medical, dental, vision, life, and disability insurance. Life & family: Fully paid parental leave for all new parents, including adoptive and surrogate journeys. Financial support for family planning. Benefits & balance: paid time off when you need it, relocation support, and more perks that optimize your time. Opportunities to connect with teammates: lunch and dinner are provided daily. We have regular off-sites and team celebrations.

Full job record

Job ID	d53089f011811807ec65a65d0d52f5ccf1aeb5cc
Org ID	83b4dbeb-3efd-46c3-bff0-8d3e2c88f32e
Source ID	dde42094-6e1e-4abd-8700-6037f9147ed6
Board ID	dde42094-6e1e-4abd-8700-6037f9147ed6
Provider	ashby
Provider Job Key	ee258570-a3f9-4276-9bf4-35b2b70dbc61
Title	Member of Technical Staff - Web Crawl Engineer
Normalized Title	—
Status	active
Active	yes
Location Text	San Francisco
Department	Engineering
Team	Engineering
Employment Type	full_time
Workplace Type	on_site
Remote Policy	—
Country	United States
Region	CA
City	San Francisco
Salary Raw	—
Salary Min	—
Salary Max	—
Salary Currency	—
Salary Period	—
Source URL	https://jobs.ashbyhq.com/reflectionai/ee258570-a3f9-4276-9bf4-35b2b70dbc61
Apply URL	https://jobs.ashbyhq.com/reflectionai/ee258570-a3f9-4276-9bf4-35b2b70dbc61/application
First Seen At	2026-06-19 09:51:00Z
Last Seen At	2026-06-20 09:57:52Z
Last Checked At	2026-06-20 09:57:52Z
Last Changed At	2026-06-19 09:51:00Z
Inactive At	—
Source Posted At	—
Source Updated At	—
Raw Payload Uri	s3://job-postings-prod-raw-590183727216/raw/provider=ashby/board=reflectionai/date=2026-06-20/2026-06-20T09-57-16-141Z-ff6a5304c9734260e852721ce1798c1be3bf287f67748b660bc21474e4d3277a.json

Event Fields

{
  "content_hash": "0d05a25735ea3d690d572d3a4ec92a37152d4dc48d71ce4b05f8207404a0b6f9",
  "source_hash": "b19fd7046b0fd1e77bcda5a7c62a408f560f83b3e91635484aff3f7d48792da0",
  "last_changed_at": "2026-06-19T09:51:00.720Z",
  "active_status": "active"
}

Parsed Structured

{
  "dedupe": null,
  "language": "en",
  "location": {
    "raw": "San Francisco",
    "city": "San Francisco",
    "region": "CA",
    "country": "United States",
    "is_remote": false,
    "confidence": 0.75
  },
  "salary_max": null,
  "salary_min": null,
  "inferred_at": "2026-06-20T09:57:52.412Z",
  "launch_scope": {
    "reason": "english_us_canada",
    "included": true,
    "language": "en",
    "location": {
      "raw": "San Francisco",
      "city": "San Francisco",
      "region": "CA",
      "country": "United States",
      "is_remote": false,
      "confidence": 0.75
    },
    "countries": [
      "United States"
    ]
  },
  "remote_policy": null,
  "salary_period": null,
  "workplace_type": "on_site",
  "salary_currency": null
}

Extensions

{}

Native Structured

{
  "id": "ee258570-a3f9-4276-9bf4-35b2b70dbc61",
  "team": "Engineering",
  "title": "Member of Technical Staff - Web Crawl Engineer",
  "jobUrl": "https://jobs.ashbyhq.com/reflectionai/ee258570-a3f9-4276-9bf4-35b2b70dbc61",
  "address": null,
  "applyUrl": "https://jobs.ashbyhq.com/reflectionai/ee258570-a3f9-4276-9bf4-35b2b70dbc61/application",
  "isListed": true,
  "isRemote": false,
  "location": "San Francisco",
  "updatedAt": null,
  "apiVersion": "ashby-non-user-graphql-v1",
  "department": "Engineering",
  "publishedAt": null,
  "workplaceType": "OnSite",
  "employmentType": "FullTime",
  "secondaryLocations": []
}

Get this page with API

Rendered from the bluedoor Job Postings API. Reproduce it:

GET https://api.bluedoor.sh/job-postings/v1/jobs/d53089f011811807ec65a65d0d52f5ccf1aeb5cc?include=descriptionJSON

GET https://api.bluedoor.sh/job-postings/v1/orgs/83b4dbeb-3efd-46c3-bff0-8d3e2c88f32eJSON

GET https://api.bluedoor.sh/job-postings/v1/sources/dde42094-6e1e-4abd-8700-6037f9147ed6JSON

GET https://api.bluedoor.sh/job-postings/v1/jobs/d53089f011811807ec65a65d0d52f5ccf1aeb5cc/eventsJSON

Docs · Get an API key