Popular
Categories
Blog - Popular articles
Jobs in Germany
At idealo, Generative AI (GenAI) is becoming a multiplier across every team. The AI Booster Team is our internal technical competence center: we pair with product teams, build reusable GenAI building blocks and share best practices company-wide. We validate AI business cases through data and ship evaluation frameworks that turn pilots into production. As a Data Scientist you will translate ideas into evidence: designing experiments, measuring LLM quality, and unlocking the full value of idealo’s data assets to guide today’s and tomorrow’s GenAI initiatives.
This position is available full-time or part-time.
Quantify opportunities & run experiments - perform causal analyses using experiments and observational methods to evaluate the business impact of GenAI features.
Own model evaluation pipelines - create metrics dashboards and human / AI-assisted reviews that benchmark LLM quality, cost and safety.
Guide model selection - compare foundation models, fine-tunes and RAG setups, recommending the right balance of performance vs. cost.
Champion data strategy - surface high-value datasets (product, pricing, behaviour) and advocate their use in current and future AI products.
Pair & coach - work embedded with engineers and analysts, sharing best practices in experimentation, metrics, and GenAI evaluation.
Harvest patterns - document reusable evaluation playbooks so every team can measure GenAI success consistently.
3 + years in data science / analytics, including A/B testing or causal inference at scale.
Expert SQL and Python (pandas, StatsModels / SciPy, scikit-learn); comfortable with notebooks and BI tools for storytelling.
Hands-on with LLM assessment - prompt / temperature sweeps, embedding similarity metrics, human-in-the-loop studies, and LLM-as-a-judge tools (e.g. Bedrock model evaluation, OpenAI Evals).
Familiar with Generative AI stacks (Hugging Face, LangChain/LlamaIndex, vector DBs like Pinecone/Qdrant) and retrieval-augmented generation concepts.
Proficiency in AWS analytics & MLOps: SageMaker Experiments / Pipelines, Bedrock, Athena, Lambda, Step Functions; able to automate evaluation workflows and cost dashboards.
Strong communication: can turn complex findings into clear, actionable insights and coach cross-functional teams.
We’re keen to see evidence of exceptional achievement - perhaps you’ve scaled a personal project to thousands of users, published influential research, ranked highly in competitive arenas (e.g. sports, Kaggle, hackathons) or maintain widely-used open-source libraries. Tell us what makes you stand out!
You don’t tick every single box? No worries! We hire people, not checklists, and value motivation to grow.
#LI-AJ