Skip to main content
Version: v1.4.1

XClassifier

Transparent Binary Classification

XClassifier provides transparent binary classification with real-time explainability. Unlike black-box models, you get instant insights into how predictions are made without needing surrogate models.

Overview

The XClassifier is xplainable's flagship transparent classification model. It uses a novel feature-wise ensemble approach where each feature gets its own decision tree, optimized for maximum information gain while maintaining complete interpretability.

Key Features

Real-time explainability

Inspect any prediction with one call. No surrogate fits, no Shapley sampling.

Rapid refitting

Update a single feature's tree in milliseconds — no full retrain required.

Feature-wise ensemble

Granular control per input. Drop, refit or constrain any feature independently.

Calibrated probabilities

Built-in probability mapping ensures your scores mean what you think they mean.

Quick Start

binary_classification.py
1from xplainable.core.models import XClassifier
2from sklearn.model_selection import train_test_split
3import pandas as pd
4
5# Load and prepare data
6data = pd.read_csv("data.csv")
7X, y = data.drop("target", axis=1), data["target"]
8X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
9
10# Train model
11model = XClassifier()
12model.fit(X_train, y_train)
13
14# Make predictions
15y_pred = model.predict(X_test)
16y_proba = model.predict_proba(X_test)
17
18# Get explanations (interactive Altair chart)
19model.explain()

Constructor Parameters

1model = XClassifier(
2 max_depth=8,
3 min_info_gain=0.0001,
4 min_leaf_size=0.0001,
5 ignore_nan=False,
6 weight=1,
7 power_degree=1,
8 sigmoid_exponent=0,
9 tail_sensitivity=1.0,
10 map_calibration=True
11)
max_depthintdefault: 8
Maximum depth of each per-feature decision tree. Higher values capture more complex splits, at the cost of overfitting risk.
Examplemax_depth=12
min_info_gainfloatdefault: 0.0001
Minimum information gain required to create a split. Raise to regularise; lower to extract more signal.
min_leaf_sizefloatdefault: 0.0001
Minimum proportion of rows in a leaf. Acts as a stopping criterion.
weightfloatdefault: 1
Sample-weight multiplier for the positive class.
power_degreeintdefault: 1
Polynomial degree used when smoothing the calibration curve. Most users should leave at 1.
sigmoid_exponentfloatdefault: 0
Sharpness of the final sigmoid calibration. Increase for more confident probabilities.
ignore_nanbooldefault: False
If True, NaNs in input data are masked rather than treated as a category.
tail_sensitivityfloatdefault: 1.0
Weight applied to divisive leaf nodes.
map_calibrationbooldefault: True
Map score to calibrated probability after fitting.

Methods

fit()

Fits the model to the training data.

1model.fit(x, y, id_columns=[], column_names=None, target_name='target', alpha=0.1)
xDataFrame or ndarrayRequired
Feature matrix
ySeries or ndarrayRequired
Target values
id_columnslistdefault: []
Columns to exclude from training (e.g. IDs)
column_nameslistdefault: None
Column names when passing a numpy array
target_namestrdefault: 'target'
Name for the target column when passing a numpy array
alphafloatdefault: 0.1
Controls the number of possible splits relative to unique values

Returns the fitted XClassifier instance.

predict()

Predicts the target class for each row.

1y_pred = model.predict(X_test, use_prob=False, threshold=0.5, remap=True)
xDataFrame or ndarrayRequired
Feature matrix
use_probbooldefault: False
Use calibrated probability instead of raw score for thresholding
thresholdfloatdefault: 0.5
Classification threshold
remapbooldefault: True
Remap predictions back to original target labels

Returns a numpy array of predicted class labels.

predict_score()

Returns the raw model score (float between 0 and 1) for each row. This is the sum of feature contributions plus the base value.

1scores = model.predict_score(X_test)

Returns a 1-D numpy array of scores.

predict_proba()

Returns calibrated probabilities for each row. Requires map_calibration=True (the default) during model construction.

1probabilities = model.predict_proba(X_test)

Returns a 1-D numpy array of probabilities (not a 2-D array like scikit-learn).

evaluate()

Returns a dictionary of classification metrics.

1metrics = model.evaluate(X_test, y_test, use_prob=False, threshold=0.5)

The returned dictionary contains:

  • confusion_matrix -- nested list
  • classification_report -- dict (same format as sklearn.metrics.classification_report(..., output_dict=True))
  • roc_auc
  • neg_brier_loss (1 - brier_score_loss)
  • log_loss
  • cohen_kappa

explain()

Renders an interactive Altair chart showing feature importances and per-feature contribution profiles. Takes no data input -- it visualises the fitted model's internal profile.

1model.explain()
2
3# Optional: control numeric label rounding
4model.explain(label_rounding=3)
note

Requires the altair package. Install with pip install xplainable[plotting].

update_feature_params()

Updates model parameters for a subset of features without retraining from scratch. This is extremely fast because the model has already pre-computed the metadata needed for reconstruction.

1model.update_feature_params(
2 features=['feature1', 'feature2'],
3 max_depth=5,
4 min_info_gain=0.01,
5 min_leaf_size=0.01,
6 ignore_nan=True,
7 weight=0.8,
8 power_degree=2,
9 sigmoid_exponent=1,
10 tail_sensitivity=0.5,
11 x=X_train, # optional: pass to recalibrate probability map
12 y=y_train # optional: pass to recalibrate probability map
13)

All parameter arguments are optional -- only the ones you pass will be updated. If map_calibration=True was used and you pass x and y, the probability calibration map is also recalculated.

Returns the updated XClassifier instance.

Rapid Refitting Benefits
  • 10-100x faster than complete retraining
  • Feature-specific updates for granular control
  • Real-time parameter tuning in production

predict_explain()

Returns a DataFrame containing per-feature contributions, the base value, the total score, calibrated probability, and support for each row.

1explanation_df = model.predict_explain(X_test)

feature_importances (property)

Returns a dictionary mapping feature names to their normalised importance scores (based on Gini gain).

1importances = model.feature_importances # no parentheses -- it's a property
2print(importances)
3# {'feature_a': 0.05, 'feature_b': 0.12, ...}

profile (property)

Returns the full model profile as a dictionary with keys 'base_value', 'numeric', and 'categorical'. Each feature maps to a list of leaf nodes with their score, mean, and frequency.

1prof = model.profile
2print(prof['base_value'])
3print(prof['numeric']['some_feature'])

_transform()

Transforms input data into per-feature contribution scores. Primarily for internal use but useful for debugging.

1contributions = model._transform(X_test) # numpy array, shape (n_samples, n_features)

Partitioned Classification

For datasets with natural segments, use PartitionedClassifier:

1from xplainable.core.models import PartitionedClassifier, XClassifier
2
3# Create partitioned model
4partitioned_model = PartitionedClassifier(partition_on='segment_column')
5
6# Train separate models for each segment
7for segment in train['segment_column'].unique():
8 segment_data = train[train['segment_column'] == segment]
9 X_seg, y_seg = segment_data.drop('target', axis=1), segment_data['target']
10
11 segment_model = XClassifier()
12 segment_model.fit(X_seg, y_seg)
13
14 partitioned_model.add_partition(segment_model, segment)
15
16# Predict with automatic segment routing
17predictions = partitioned_model.predict(X_test)
18
19# Explain a specific partition
20partitioned_model.explain(partition='some_segment')

PartitionedClassifier provides the same predict(), predict_score(), predict_proba(), and explain() methods, automatically routing each observation to its segment's model.

Hyperparameter Optimization

Use XParamOptimiser for Bayesian hyperparameter search (classification only).

1from xplainable.core.optimisation.bayesian import XParamOptimiser
2
3# Create optimiser with defaults
4opt = XParamOptimiser(
5 metric='roc-auc', # see list of supported metrics below
6 n_trials=30,
7 n_folds=5,
8 early_stopping=30,
9 shuffle=False,
10 subsample=1,
11 alpha=0.01
12)
13
14# Run optimisation
15best_params = opt.optimise(X_train, y_train)
16
17# Train model with optimised parameters
18model = XClassifier(**best_params)
19model.fit(X_train, y_train)

Supported Metrics

'roc-auc' (default), 'macro-f1', 'weighted-f1', 'positive-f1', 'negative-f1', 'macro-precision', 'weighted-precision', 'positive-precision', 'negative-precision', 'macro-recall', 'weighted-recall', 'positive-recall', 'negative-recall', 'accuracy', 'brier-loss', 'log-loss'

Customising the Search Space

You can fix any parameter to a single value or provide a search range as a 3-element list [start, stop, step]:

1opt = XParamOptimiser(
2 metric='roc-auc',
3 n_trials=50,
4 n_folds=5,
5 early_stopping=30,
6 max_depth_space=[4, 10, 2], # search [4, 6, 8]
7 min_leaf_size_space=[0.005, 0.05, 0.005], # search range
8 min_info_gain_space=[0.005, 0.05, 0.005],
9 ignore_nan_space=[False, True],
10 weight_space=[0, 1.2, 0.05],
11 power_degree_space=[1, 3, 2],
12 sigmoid_exponent_space=[0.5, 1, 0.1],
13)
14
15best_params = opt.optimise(X_train, y_train)

To fix a parameter instead of searching, pass a scalar value instead of a list:

1opt = XParamOptimiser(
2 max_depth_space=6, # fixed at 6 -- not searched
3 weight_space=[0, 1.2, 0.05], # searched
4)

Evaluation with sklearn

Since predict_proba() returns a 1-D array, use it directly with sklearn metrics:

1from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
2
3y_pred = model.predict(X_test)
4y_scores = model.predict_score(X_test)
5
6# Classification report
7print(classification_report(y_test, y_pred))
8
9# Confusion matrix
10print(confusion_matrix(y_test, y_pred))
11
12# ROC-AUC (use raw scores)
13print(f"ROC-AUC: {roc_auc_score(y_test, y_scores):.3f}")

Or use the built-in evaluate() method:

1metrics = model.evaluate(X_test, y_test)
2print(f"ROC-AUC: {metrics['roc_auc']:.3f}")
3print(f"Cohen's Kappa: {metrics['cohen_kappa']:.3f}")

Cloud Deployment

For deploying models to the Xplainable Cloud platform, see the REST API documentation.

Best Practices

Data Preparation

Data Quality
  • Handle missing values appropriately (set ignore_nan=True if you want the model to skip NaNs during training)
  • Encode categorical variables using a preprocessing pipeline before training
  • Remove ID columns by passing them via id_columns parameter in fit()
  • Remove highly correlated features for better interpretability
1model = XClassifier(
2 max_depth=8, # default -- good starting point
3 min_info_gain=0.0001, # default -- prevents trivial splits
4 min_leaf_size=0.0001, # default -- allows fine-grained leaves
5 weight=1, # default activation weight
6 map_calibration=True # default -- enables predict_proba()
7)

Troubleshooting

Model not fitting properly

Possible causes:

  • Insufficient data for the complexity
  • Highly imbalanced classes
  • Poor feature quality

Solutions:

  • Reduce max_depth or increase min_leaf_size
  • Use class weights or resampling
  • Improve feature engineering
Poor probability calibration

Solutions:

  • Ensure map_calibration=True
  • Use larger training dataset
  • Consider probability calibration post-processing
Slow training performance

Solutions:

  • Reduce max_depth parameter
  • Increase min_info_gain threshold
  • Use feature selection to reduce dimensionality

Next Steps

Ready to Explore?