Shopify Order Return Prediction
Predict which orders are likely to be returned before they ship, enabling proactive interventions like quality checks, adjusted return policies, or targeted post-purchase support.
Dataset Source:
Kaggle - Shopify Sales Dataset for ML & EDA
Problem Type: Binary Classification (order-level) Target Variable: is_returned
— 1 = order was returned, 0 = order kept Return Rate: ~14.8% (well-balanced for
classification) Use Case: Flag high-risk orders at checkout for operational routing,
fraud detection, or post-purchase follow-up
Package Imports
Instantiate Xplainable Cloud
Initialise the xplainable cloud using an API key from: https://platform.xplainable.io/
Load Shopify Order Data
Unlike the churn notebook, this model operates at the order level — each row is a single order, no aggregation needed. This makes the preprocessing pipeline fully self-contained and directly deployable.
1. Data Preprocessing
Columns dropped:
order_id,customer_id,product_id— highly cardinal identifiers with no predictive valueprofit,revenue,discounted_price— derived from other columns (price, discount, quantity) and would be data leakage since profit is affected by the return itselforder_date— extracted into month and day-of-week features first
Features retained:
product_category,product_price,discount_percent,quantity— order characteristicscustomer_country,traffic_source,payment_method— contextshipping_cost,rating— potential return signalsorder_month,order_dow— temporal patterns
Persist Preprocessor to Xplainable Cloud
Since this is an order-level model with no aggregation step, the entire preprocessing pipeline is self-contained and can be persisted directly.
Train/Test Split
2. Model Optimisation
3. Model Training
4. Model Interpretability and Explainability
Which order characteristics are most predictive of returns? The explainer reveals whether price, discount level, product category, or other factors drive return risk.
5. Model Evaluation
6. Model Persisting
7. Model Deployment
8. Testing the Deployment
9. AI-Generated Report
10. Contribution-Driven Return Optimization
The xplainable model's per-feature contributions explain why each order is at risk. Several features in the dataset represent controllable business levers:
shipping_cost— the business can offer free or subsidized shippingdiscount_percent— the business decides whether to discountquantity— bundle incentives can increase items per orderproduct_price— pricing strategy is controllable
The model's partition profiles give us the measured return rate shift when a feature moves from one partition to another. We use these counterfactual shifts as lever effects — derived from the data, not assumed.
Extract Contributions and Counterfactual Lever Effects
For each controllable feature, the lever effect = current contribution - best achievable partition score. This tells us how much the return probability would drop if we moved that order to the best partition for that feature.
Map Levers to Actions and Costs
Each controllable feature maps to a business action. The lever effect is from the model (data-driven), the cost is a business input (replace with your actuals).
Expected Value Optimization
The net EV uses the model-derived lever effect as the return prevention rate:
Net EV = lever_effect x avg_return_cost - lever_cost
Where lever_effect is the counterfactual return probability reduction from the model's
partition profile — measured from the data. avg_return_cost is the operational cost of
processing a return (shipping, restocking, CS time).
Budget-Constrained Allocation
Rank orders by ROI and allocate a fixed budget to the highest-return interventions first.
What's Data-Driven vs What's Assumed
From the model (data-driven):
- Return probability per order (base_value + contributions)
- Per-feature contribution scores explaining why each order is at risk
- Counterfactual lever effects: how much return probability changes if a feature moves from its current partition to the best partition
- Which lever offers the most improvement for each specific order
Business inputs (replace with your actuals):
- Intervention costs ($0.50 for a video, $5 for free shipping, etc.)
- Average return processing cost ($25 in this example)
- The assumption that moving a feature to a better partition is achievable via the mapped action