Senior Research Data Engineer

/fully

Fully remote

About The Position

Computer Graphic

Kaiko’s Multimodal Large Language Model (MLLM) is trained on domain-specific, high-complexity medical data. To reach clinical-grade performance, we’ll need to ramp up our data efforts to manage massive scale, ensure consistent quality, and tightly control data relevance and integrity.

As a Senior Research Data Engineer, you will design and implement our data‑sourcing, synthetic‑generation, and curation pipelines. High‑quality datasets are the fuel for frontier‑scale language models, and you will play a pivotal role in producing them.

You will build high‑throughput data pipelines that:

  • Ingest multi‑modal data at petabyte scale.
  • Generate large volumes of synthetic data.
  • Filter & rate content by topic, quality, and policy compliance.

You will work closely with ML researchers and help steer the development of our state‑of‑the‑art foundation models. You will be based in Zurich or Amsterdam, with the expectation of spending half of your time at the office.

Profile

  • Excellent programming skills in Python and deep experience with distributed frameworks such as Ray or Spark.
  • Proven track record designing & operating large‑scale data pipelines and running data‑quality experiments.
  • Experience building or integrating synthetic‑data pipelines for LLMs.
  • Deep familiarity with lakehouse paradigms (Delta, Iceberg) and columnar formats (Parquet, ORC).
  • Experience with core data‑processing primitives (hashing, deduplication, chunking etc.) and associated scalability/performance trade‑offs.
  • Strong communication skills and the ability to present experimental results and technical concepts clearly and concisely.

Nice To Have:

  • Handson production experience orchestrating complex DAGs in Dagster (preferred) or similar workflow engines.
  • Expertise in dataquality & validation frameworks and monitoring/observability tooling.

RECRUITMENT PROCESS
NO WHITEBOARDS, NO RIDDLES

We build a partnership approach and focus on getting to know
each other as well as possible.

CV Review graphic

01. CV REVIEW

First look at whether we are a good match (1-7 days).

Technical Review graphic

02. TECHNICAL & HR INTERVIEW AT ONE TIME

Deep dive into experience and both theoretical and practical skills (1,5 hour).

Offer graphic

03. OFFER

Say yes and welcome aboard!

About us

Our Projects

Case study

mirumee

Clutch Global BadgeClutch Champion Badge

Address

,

VAT EU:

KRS:

COPYRIGHT © 2009-2025 MIRUMEE SOFTWARE.PRIVACY POLICY