
Working Student (m/f/d) LLM Agent Evaluation & Benchmarking
Erforderliche Skills
Stellenbeschreibung
Agile Robots SE hat diese Anzeige veröffentlicht. Wir haben unten unseren eigenen Werkstudenten-Kontext ergänzt — was diese Stelle für deine Wochenstunden, dein Netto und dein Studierendenvisum als Studierende:r in Munich, Deutschland bedeutet.
Noch keinen Lebenslauf dafür?Lebenslauf mit resume.io erstellen
Beschreibung bereitgestellt von Agile Robots SE
About the role
We are looking for a Working Student (m/f/d) LLM Agent Evaluation & Benchmarking. In this role, you will design and build an agent-agnostic benchmarking harness, run comparative evaluations across frontier and local models, and translate findings into prompt, guard, and tool-schema improvements.
Your Responsibilities
- Harness Development: Design and build an agent-agnostic benchmarking harness that executes versioned task suites against frontier and local models with reproducible, version-controlled runs.
- Task Suite Design: Define and maintain evaluation task suites that measure task success, grounding accuracy, latency, and cost across the agent portfolio.
- Model Evaluation: Run periodic head-to-head evaluations across models to produce structured comparisons that support model selection decisions.
- Eval Reporting: Analyze evaluation outputs and produce reports and visualizations that communicate findings clearly to agent owners.
- Improvement Feedback: Translate evaluation findings into concrete changes to prompts, guard logic, and tool schemas in collaboration with agent owners.
Essential Skills
- Academic Background: Currently enrolled in a Master's programme in Computer Science, Machine Learning, or Data Science.
- Python Engineering: Ability to write well-structured Python for tooling and automation, including test frameworks such as pytest, dependency management, reproducible execution, and basic CI pipeline configuration.
- Eval Frameworks: Working familiarity with LLM evaluation frameworks such as LangSmith, Ragas, Inspect AI, or lm-evaluation-harness.
- Agent Concepts: Working understanding of LLM APIs (OpenAI, Anthropic, Ollama), prompt structure, and how multi-step agent systems are built and instrumented.
- Experimental Design: Ability to design controlled comparisons, define success metrics, and interpret results across multiple evaluation conditions.
Beneficial Skills
- Data Analysis: Familiarity with statistical comparison methods and data handling using numpy, pandas, and scikit-learn.
- Reporting Tools: Familiarity with data visualization and reporting using tools such as Plotly, Streamlit, or notebooks.
- Agent Frameworks: Familiarity with agent orchestration frameworks such as LangChain or LangGraph.
What we offer
- Practical learning opportunities to complement your studies.
- Dynamic high-tech company combined with financial soundness and world class investors.
- Join an interdisciplinary, international team with 60+ different nationalities in a collaborative work environment.
- Corporate Benefits Program that covers health, mobility and learning with 100 € net per month.
- Modern office facilities with a rooftop terrace overlooking Munich, free drinks & fruits, and regular company events contribute to a good working environment.
About us
Agile Robots SE is an international high-tech company based in Munich, Germany with a production site in Kaufbeuren and more than 2300 employees worldwide. Our mission is to bridge the gap between artificial intelligence and robotics by developing systems that combine state-of-the-art force-moment-sensing and world-leading image-processing technology. This unique combination of technologies allows us to provide user-friendly and affordable robotic solutions that enable intelligent precision assembly.
This is made possible by our employees, who bring out the best in each and every day with creativity and enthusiasm. Become part of this team and shape the future of robotics with us!
We are proud of our diversity and welcome your application regardless of gender and sexual identity, nationality, ethnicity, religion, age, or disability.
Wichtiges für Werkstudierende
Was diese Werkstudentenstelle im Bereich Tech in Munich für dich bedeutet — die Wochenstundenregeln, Vorteile bei den Sozialabgaben und worauf internationale Studierende vor der Bewerbung achten sollten.
Wochenstunden
Werkstudierende dürfen während des Semesters bis zu 20 Stunden pro Woche arbeiten, in den Semesterferien Vollzeit. Wer das einhält, behält den Studierendenstatus und die Werkstudentenvorteile.
Werkstudenten-RegelnSozialabgaben
Dank des Werkstudentenprivilegs zahlst du keine Beiträge zur Kranken-, Pflege- und Arbeitslosenversicherung — nur die Rentenversicherung greift. Dir bleibt mehr netto als in einem regulären Job.
Versicherung prüfenInternationale Studierende
Studierende von außerhalb der EU dürfen 140 volle oder 280 halbe Tage im Jahr arbeiten (seit März 2024, zuvor 120/240). Ein Werkstudentenvertrag passt meist in diesen Rahmen — prüfe die genauen Grenzen auf deinem Aufenthaltstitel.
Studieren in Deutschland