Senior Research Data Engineer
/fullyFully remote
About The Position
Kaiko’s Multimodal Large Language Model (MLLM) is trained on domain-specific, high-complexity medical data. To reach clinical-grade performance, we’ll need to ramp up our data efforts to manage massive scale, ensure consistent quality, and tightly control data relevance and integrity.
As a Senior Research Data Engineer, you will design and implement our data‑sourcing, synthetic‑generation, and curation pipelines. High‑quality datasets are the fuel for frontier‑scale language models, and you will play a pivotal role in producing them.
You will build high‑throughput data pipelines that:
- Ingest multi‑modal data at petabyte scale.
- Generate large volumes of synthetic data.
- Filter & rate content by topic, quality, and policy compliance.
You will work closely with ML researchers and help steer the development of our state‑of‑the‑art foundation models. You will be based in Zurich or Amsterdam, with the expectation of spending half of your time at the office.
Profile
- Excellent programming skills in Python and deep experience with distributed frameworks such as Ray or Spark.
- Proven track record designing & operating large‑scale data pipelines and running data‑quality experiments.
- Experience building or integrating synthetic‑data pipelines for LLMs.
- Deep familiarity with lakehouse paradigms (Delta, Iceberg) and columnar formats (Parquet, ORC).
- Experience with core data‑processing primitives (hashing, deduplication, chunking etc.) and associated scalability/performance trade‑offs.
- Strong communication skills and the ability to present experimental results and technical concepts clearly and concisely.
Nice To Have:
- Hands‑on production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines.
- Expertise in data‑quality & validation frameworks and monitoring/observability tooling.
RECRUITMENT PROCESSNO WHITEBOARDS, NO RIDDLES
We build a partnership approach and focus on getting to know
each other as well as possible.
01. CV REVIEW
First look at whether we are a good match (1-7 days).
02. TECHNICAL & HR INTERVIEW AT ONE TIME
Deep dive into experience and both theoretical and practical skills (1,5 hour).
03. OFFER
Say yes and welcome aboard!