Data Integration & Onboarding Intern (m/f/d)

vor 10 TagenTechPraktikum

Data Integration & Onboarding Intern (m/f/d)

PRYZM Solutions

RemoteEnglisch

Du wirst zur Bewerbungsseite des Unternehmens weitergeleitet. Wir speichern keine personenbezogenen Daten.

Erforderliche Skills

data quality assessmentAirflowGreat ExpectationsJupyter notebooksETLdbtMLOpsdata validationCRMERPpandassynthetic data generationanomaly detectionSQLPython

Stellenbeschreibung

About PRYZM Solutions

Pryzm Solutions builds AI-powered pricing intelligence for mid-market B2B manufacturers in the DACH region. We turn weeks-long manual pricing processes into data-driven, near-real-time decisions — EU-sovereign infrastructure.

About the Role

Every new customer we onboard comes with messy, heterogeneous data spread across ERP, CRM, and operational systems. Getting from "raw customer export" to "clean, model-ready data" is currently the single biggest bottleneck on our path to scale.

We're hiring our first Data Integration Intern to help us turn onboarding into a repeatable, documented, semi-automated process. Your work will directly shape how fast we can take on new customers — and how many we can serve at once.

What You'll Do

• Standardize our data intake. Turn one-off onboarding workflows into a reproducible specification: what data we need, in what format, from which source systems.

• Build a data quality assessment framework. Automate what our engineers do manually today — completeness checks, consistency validation, anomaly detection — into reusable Python notebooks and report templates.

• Develop synthetic data generators. Build realistic pharma/manufacturing-style datasets we can use for demos, training, and testing without touching customer data.

• Document onboarding runbooks. Shadow our onboarding work, extract the tacit knowledge, and turn it into step-by-step playbooks anyone on the team can follow.

• Analyze pilot retrospectives. Where did we lose time? Which steps repeat across customers? Surface patterns that turn into product features.

What You Get

• Paid monthly stipend

• Certificate of completion

• LinkedIn recommendation from the team

• Real experience shipping at an early-stage AI startup

• Direct mentorship from the technical team on causal AI modeling, and MLOps

• A real shot at joining the core team full-time after the internship, based on performance

Location & Duration

• Remote — EU time zones preferred

• 6 months, full-time preferred (part-time negotiable for strong candidates)

• Occasional in-person meetups possible

Requirements

• Strong Python skills, comfort with pandas and SQL

• Obsessive attention to data quality — you notice when something is off

• Clear written communication — you can turn a messy process into a clean document

• Ability to work independently and ship without hand-holding

• Genuine interest in B2B data, pricing, or industrial/pharma domains

Nice to have

• Background in data science, computer science, industrial engineering, or a quantitative field

• Exposure to ERP systems or CRM

• Experience with ETL tooling (dbt, Airflow, Great Expectations) or synthetic data generation

• German language skills

How to Apply

The application is the job itself. No CV needed.

We've prepared a messy, realistic dataset that mimics what a mid-market manufacturer would hand us on day one — inconsistent SKU IDs, missing costs, fragmented customer records. It's publicly available, no NDA required.

Your task:

1. 1. Download the dataset.

We use the UCI Online Retail II dataset as our test bed — a real, messy, two-year transactional dataset from a UK wholesaler (~540,000 rows). It has duplicates, cancellations, missing customer IDs, and inconsistent product codes: structurally similar to what we encounter when onboarding a new pharmaceutical or industrial manufacturing customer.

Download it here: https://archive.ics.uci.edu/dataset/502/online+retail+ii

2. Your task is to treat this as if it were a Pryzm customer's first data handover. The domain is different — the data quality challenges are not. Run a data quality assessment. What's broken? What's missing? What would block a pricing model from training on this data?

3. Propose an intake specification. If you were designing the form a new customer fills out to hand us this data, what fields would you require? What checks would you run automatically?

4. Write it up. A 2–4 page document (PDF or Markdown) covering: your quality findings, your proposed specification, and one concrete improvement you'd prioritize first.

5. Include at the top: your name, email, LinkedIn, and location.

Email your document to [email protected] with the subject line: Data Integration Intern — [Your Name]

If your work is sharp, we will schedule a call.

Du wirst zur Bewerbungsseite des Unternehmens weitergeleitet. Wir speichern keine personenbezogenen Daten.

Gib deine E-Mail ein, um Benachrichtigungen für ähnliche Stellenangebote zu erhalten.

Jetzt bewerben Fehler melden

Ähnliche Stellen

vor 2 TagenVor Ort