Data Integration & Onboarding Intern (m/f/d)
Jetzt bewerbenBereit zur Bewerbung?
Du wirst zur Bewerbungsseite des Unternehmens weitergeleitet. Wir speichern keine personenbezogenen Daten.
About PRYZM Solutions
Pryzm Solutions builds AI-powered pricing intelligence for mid-market B2B manufacturers in the DACH region. We turn weeks-long manual pricing processes into data-driven, near-real-time decisions — EU-sovereign infrastructure.
About the Role
Every new customer we onboard comes with messy, heterogeneous data spread across ERP, CRM, and operational systems. Getting from "raw customer export" to "clean, model-ready data" is currently the single biggest bottleneck on our path to scale.
We're hiring our first Data Integration Intern to help us turn onboarding into a repeatable, documented, semi-automated process. Your work will directly shape how fast we can take on new customers — and how many we can serve at once.
What You'll Do
• Standardize our data intake. Turn one-off onboarding workflows into a reproducible specification: what data we need, in what format, from which source systems.
• Build a data quality assessment framework. Automate what our engineers do manually today — completeness checks, consistency validation, anomaly detection — into reusable Python notebooks and report templates.
• Develop synthetic data generators. Build realistic pharma/manufacturing-style datasets we can use for demos, training, and testing without touching customer data.
• Document onboarding runbooks. Shadow our onboarding work, extract the tacit knowledge, and turn it into step-by-step playbooks anyone on the team can follow.
• Analyze pilot retrospectives. Where did we lose time? Which steps repeat across customers? Surface patterns that turn into product features.
What You Get
• Paid monthly stipend
• Certificate of completion
• LinkedIn recommendation from the team
• Real experience shipping at an early-stage AI startup
• Direct mentorship from the technical team on causal AI modeling, and MLOps
• A real shot at joining the core team full-time after the internship, based on performance
Location & Duration
• Remote — EU time zones preferred
• 6 months, full-time preferred (part-time negotiable for strong candidates)
• Occasional in-person meetups possible
Requirements
• Strong Python skills, comfort with pandas and SQL
• Obsessive attention to data quality — you notice when something is off
• Clear written communication — you can turn a messy process into a clean document
• Ability to work independently and ship without hand-holding
• Genuine interest in B2B data, pricing, or industrial/pharma domains
Nice to have
• Background in data science, computer science, industrial engineering, or a quantitative field
• Exposure to ERP systems or CRM
• Experience with ETL tooling (dbt, Airflow, Great Expectations) or synthetic data generation
• German language skills
How to Apply
The application is the job itself. No CV needed.
We've prepared a messy, realistic dataset that mimics what a mid-market manufacturer would hand us on day one — inconsistent SKU IDs, missing costs, fragmented customer records. It's publicly available, no NDA required.
Your task:
1. 1. Download the dataset.
We use the UCI Online Retail II dataset as our test bed — a real, messy, two-year transactional dataset from a UK wholesaler (~540,000 rows). It has duplicates, cancellations, missing customer IDs, and inconsistent product codes: structurally similar to what we encounter when onboarding a new pharmaceutical or industrial manufacturing customer.
Download it here: https://archive.ics.uci.edu/dataset/502/online+retail+ii
2. Your task is to treat this as if it were a Pryzm customer's first data handover. The domain is different — the data quality challenges are not. Run a data quality assessment. What's broken? What's missing? What would block a pricing model from training on this data?
3. Propose an intake specification. If you were designing the form a new customer fills out to hand us this data, what fields would you require? What checks would you run automatically?
4. Write it up. A 2–4 page document (PDF or Markdown) covering: your quality findings, your proposed specification, and one concrete improvement you'd prioritize first.
5. Include at the top: your name, email, LinkedIn, and location.
Email your document to [email protected] with the subject line: Data Integration Intern — [Your Name]
If your work is sharp, we will schedule a call.
Erhalte mehr Jobs wie diesen
Gib deine E-Mail ein, um Benachrichtigungen für ähnliche Stellenangebote zu erhalten.