Skip to main content
Version: v1.4.1

Partitioned Models

Multi-Segment Modeling

Partitioned models enable training separate transparent models on different data segments, then combining them for improved accuracy and deeper insights. Perfect for datasets with natural groupings or heterogeneous patterns.

Overview

Partitioned models are a powerful technique for handling datasets where different segments exhibit distinct patterns. Instead of training one model on all data, partitioned models train specialized models for each segment, then intelligently route predictions to the appropriate model.

Xplainable provides built-in PartitionedClassifier and PartitionedRegressor classes that handle routing, prediction, and fallback automatically.

Key Benefits

  • Specialized models: Each segment gets a model optimized for its specific patterns
  • Improved accuracy: Often outperforms single models by capturing segment-specific relationships
  • Deeper insights: Understand how different segments behave and what drives their outcomes
  • Robust fallback: Automatic fallback to a '__dataset__' model for unknown segments

How Partitioned Models Work

Partitioned models are not ensemble models. Instead of combining predictions from multiple models, they:

  1. Route data to the appropriate model based on the partition_on column value
  2. Train specialized models on homogeneous data subsets
  3. Maintain transparency -- each prediction comes from a single, explainable model
  4. Provide fallback -- unknown partition values automatically use the '__dataset__' model
When to Use Partitioned Models
  • Geographic segmentation -- Different regions have different patterns
  • Customer segments -- B2B vs B2C, different industries, etc.
  • Time-based segments -- Seasonal models, weekday vs weekend
  • Product categories -- Different products have different drivers
  • Heterogeneous data -- Mixed populations with distinct characteristics

The PartitionedClassifier Class

Constructor

1from xplainable.core.models import PartitionedClassifier
2
3partitioned_model = PartitionedClassifier(partition_on='segment_column')

Parameters:

  • partition_on (str, optional): The column name used to route predictions to the correct sub-model.

Key Methods

add_partition(model, partition)method
Add a trained XClassifier as a named partition.
drop_partition(partition)method
Remove a partition by name.
predict(x, use_prob=False, threshold=0.5)method
Predict target values, routing each row to the appropriate partition model.
predict_score(x)method
Predict raw scores for each row.
predict_proba(x)method
Predict calibrated probabilities for each row.
explain(partition='__dataset__')method
Show the explainer visualization for a specific partition.

Key Attributes

partitionsdict
Dictionary of partition name to model mappings.
partition_onstr
The column used for routing predictions.

Classification Example

1from xplainable.core.models import XClassifier, PartitionedClassifier
2import pandas as pd
3from sklearn.model_selection import train_test_split
4
5# Load your data
6data = pd.read_csv('customer_data.csv')
7train, test = train_test_split(data, test_size=0.2, random_state=42)
8
9partition_column = 'customer_segment'
10
11# Create partitioned model
12partitioned_model = PartitionedClassifier(partition_on=partition_column)
13
14# Train a model for each segment and add it as a partition
15for segment in train[partition_column].unique():
16 segment_data = train[train[partition_column] == segment]
17 X_segment = segment_data.drop('target', axis=1)
18 y_segment = segment_data['target']
19
20 model = XClassifier(max_depth=6, min_info_gain=0.01)
21 model.fit(X_segment, y_segment)
22
23 partitioned_model.add_partition(model, segment)
24
25# Train a default model on all data for unknown segments
26X_all = train.drop('target', axis=1)
27y_all = train['target']
28default_model = XClassifier(max_depth=5, min_info_gain=0.02)
29default_model.fit(X_all, y_all)
30partitioned_model.add_partition(default_model, '__dataset__')
31
32# Make predictions -- routing happens automatically based on partition_on column
33X_test = test.drop('target', axis=1)
34y_test = test['target']
35predictions = partitioned_model.predict(X_test)
36
37# Get scores instead of class labels
38scores = partitioned_model.predict_score(X_test)
39
40# Get calibrated probabilities
41probabilities = partitioned_model.predict_proba(X_test)
42
43# Evaluate
44from sklearn.metrics import accuracy_score
45accuracy = accuracy_score(y_test, predictions)
46print(f"Partitioned model accuracy: {accuracy:.3f}")

The PartitionedRegressor Class

Constructor

1from xplainable.core.models import PartitionedRegressor
2
3partitioned_model = PartitionedRegressor(partition_on='region')

Parameters:

  • partition_on (str, optional): The column name used to route predictions.

Key Methods

add_partition(model, partition)method
Add a trained XRegressor as a named partition.
drop_partition(partition)method
Remove a partition by name.
predict(x)method
Predict target values, routing each row to the appropriate partition model.
explain(partition='__dataset__')method
Show the explainer visualization for a specific partition.

Regression Example

1from xplainable.core.models import XRegressor, PartitionedRegressor
2from xplainable.core.optimisation.genetic import XEvolutionaryNetwork
3from xplainable.core.optimisation.layers import Tighten, Evolve
4import pandas as pd
5from sklearn.model_selection import train_test_split
6
7# Load sales data with regional segments
8data = pd.read_csv('sales_data.csv')
9train, test = train_test_split(data, test_size=0.2, random_state=42)
10
11partition_column = 'region'
12
13# Create partitioned regressor
14partitioned_model = PartitionedRegressor(partition_on=partition_column)
15
16# Train region-specific models with optimization
17for region in train[partition_column].unique():
18 region_data = train[train[partition_column] == region]
19 X_region = region_data.drop('sales', axis=1)
20 y_region = region_data['sales']
21
22 # Train base model
23 model = XRegressor(max_depth=7, min_info_gain=0.005)
24 model.fit(X_region, y_region)
25
26 # Optimize tail sensitivity
27 model.optimise_tail_sensitivity(X_region, y_region)
28
29 # Apply XEvolutionaryNetwork
30 network = XEvolutionaryNetwork(model)
31 network.add_layer(Tighten(iterations=100, learning_rate=0.03, early_stopping=15))
32 network.add_layer(Evolve(mutations=80, generations=30, early_stopping=10))
33 network.fit(X_region, y_region)
34 network.optimise()
35
36 partitioned_model.add_partition(model, region)
37
38# Add default model
39X_all = train.drop('sales', axis=1)
40y_all = train['sales']
41default_model = XRegressor(max_depth=6, min_info_gain=0.01)
42default_model.fit(X_all, y_all)
43partitioned_model.add_partition(default_model, '__dataset__')
44
45# Predict
46X_test = test.drop('sales', axis=1)
47y_test = test['sales']
48predictions = partitioned_model.predict(X_test)
49
50# Evaluate each partition's model individually
51for partition_name, model in partitioned_model.partitions.items():
52 if partition_name == '__dataset__':
53 continue
54 metrics = model.evaluate(
55 test[test[partition_column] == partition_name].drop('sales', axis=1),
56 test[test[partition_column] == partition_name]['sales']
57 )
58 print(f"{partition_name}: MAE={metrics['MAE']}, R2={metrics['R2 Score']}")

Explaining Partitioned Models

Each partition can be explained independently:

1# Explain the default model
2partitioned_model.explain('__dataset__')
3
4# Explain a specific partition
5partitioned_model.explain('North America')
6
7# Compare feature importances across partitions
8for partition_name, model in partitioned_model.partitions.items():
9 importances = model.feature_importances
10 print(f"\n{partition_name} -- Top 5 features:")
11 sorted_features = sorted(importances.items(), key=lambda x: x[1], reverse=True)[:5]
12 for feature, importance in sorted_features:
13 print(f" {feature}: {importance:.4f}")

Rapid Refitting with Partitioned Models

You can apply update_feature_params() to individual partition models for fine-tuning:

1# Update parameters for a specific partition model
2partitioned_model.partitions['North America'].update_feature_params(
3 features=partitioned_model.partitions['North America'].columns,
4 weight=0.8,
5 power_degree=2.0
6)
7
8# Update all partition models with the same parameters
9for partition_name, model in partitioned_model.partitions.items():
10 model.update_feature_params(
11 features=model.columns,
12 weight=0.8,
13 power_degree=2.0
14 )

Dynamic Partitioning

Adapt model complexity based on segment characteristics:

1def create_dynamic_partitions(train_data, partition_column, target_column):
2 """Create partitions with complexity adapted to segment size."""
3
4 partitioned_model = PartitionedClassifier(partition_on=partition_column)
5
6 for segment in train_data[partition_column].unique():
7 segment_data = train_data[train_data[partition_column] == segment]
8
9 if len(segment_data) < 50:
10 print(f"Skipping {segment}: insufficient data ({len(segment_data)} samples)")
11 continue
12
13 # Adjust model complexity based on segment size
14 segment_size = len(segment_data)
15 if segment_size > 500:
16 model_params = {'max_depth': 8, 'min_info_gain': 0.005}
17 elif segment_size > 200:
18 model_params = {'max_depth': 6, 'min_info_gain': 0.01}
19 else:
20 model_params = {'max_depth': 4, 'min_info_gain': 0.02}
21
22 model = XClassifier(**model_params)
23 X_segment = segment_data.drop(target_column, axis=1)
24 y_segment = segment_data[target_column]
25 model.fit(X_segment, y_segment)
26
27 partitioned_model.add_partition(model, segment)
28 print(f"Created model for {segment}: {segment_size} samples, depth={model_params['max_depth']}")
29
30 # Default model
31 X_all = train_data.drop(target_column, axis=1)
32 y_all = train_data[target_column]
33 default_model = XClassifier(max_depth=6, min_info_gain=0.01)
34 default_model.fit(X_all, y_all)
35 partitioned_model.add_partition(default_model, '__dataset__')
36
37 return partitioned_model

Performance Comparison

Partitioned vs Single Model

1def compare_partitioned_vs_single_model(data, partition_column, target_column, test_size=0.2):
2 """Compare partitioned models vs single model performance."""
3 from sklearn.metrics import accuracy_score
4
5 train_data, test_data = train_test_split(data, test_size=test_size, random_state=42)
6
7 # Train single model
8 single_model = XClassifier(max_depth=6, min_info_gain=0.01)
9 X_train = train_data.drop(target_column, axis=1)
10 y_train = train_data[target_column]
11 single_model.fit(X_train, y_train)
12
13 # Train partitioned models
14 partitioned_model = PartitionedClassifier(partition_on=partition_column)
15
16 for segment in train_data[partition_column].unique():
17 segment_data = train_data[train_data[partition_column] == segment]
18 if len(segment_data) < 20:
19 continue
20 model = XClassifier(max_depth=6, min_info_gain=0.01)
21 X_seg = segment_data.drop(target_column, axis=1)
22 y_seg = segment_data[target_column]
23 model.fit(X_seg, y_seg)
24 partitioned_model.add_partition(model, segment)
25
26 # Default model
27 default = XClassifier(max_depth=5, min_info_gain=0.02)
28 default.fit(X_train, y_train)
29 partitioned_model.add_partition(default, '__dataset__')
30
31 # Evaluate
32 X_test = test_data.drop(target_column, axis=1)
33 y_test = test_data[target_column]
34
35 single_predictions = single_model.predict(X_test)
36 single_accuracy = accuracy_score(y_test, single_predictions)
37
38 partitioned_predictions = partitioned_model.predict(X_test)
39 partitioned_accuracy = accuracy_score(y_test, partitioned_predictions)
40
41 print(f"Single Model Accuracy: {single_accuracy:.4f}")
42 print(f"Partitioned Model Accuracy: {partitioned_accuracy:.4f}")
43 print(f"Improvement: {partitioned_accuracy - single_accuracy:.4f}")

Best Practices

Choosing Good Partitions
  1. Business logic: Partitions should make business sense
  2. Sufficient data: Each partition needs enough samples (typically 100+)
  3. Distinct patterns: Segments should have genuinely different relationships with the target
  4. Stability: Partition values should be consistent over time
  5. Interpretability: Partitions should be explainable to stakeholders
Working with Partitioned Models
  1. Always include a '__dataset__' partition as a fallback for unknown segment values
  2. The partition_on column must be present in the prediction data for routing to work
  3. Target maps must match across all partition models (enforced automatically)
  4. Use explain() per partition to understand segment-specific behavior
  5. Combine with XEvolutionaryNetwork for regression partition models to maximize performance

Next Steps

Ready for More Advanced Topics?