Skip to main content
Version: Next

Classification – Binary

Transparent Binary Classification

XClassifier provides transparent binary classification with real-time explainability. Unlike black-box models, you get instant insights into how predictions are made without needing surrogate models.

Overview

The XClassifier is xplainable's flagship transparent classification model. It uses a novel feature-wise ensemble approach where each feature gets its own decision tree, optimized for maximum information gain while maintaining complete interpretability.

Key Features

🔍 Real-time Explainability

Get instant explanations as part of the prediction process - no SHAP or LIME needed.

⚡ Rapid Refitting

Update parameters on individual features without complete retraining.

🎯 Feature-wise Ensemble

Each feature gets its own decision tree, providing granular control and transparency.

📊 Probability Calibration

Built-in probability mapping for reliable confidence scores.

Quick Start

GUI Interface

Training an XClassifier with the embedded GUI is the fastest way to get started:

import xplainable as xp
import pandas as pd

# Load your data
data = pd.read_csv('data.csv')

# Train your model (opens embedded GUI)
model = xp.classifier(data)
GUI Benefits

The GUI interface provides:

  • Interactive hyperparameter tuning
  • Real-time performance metrics
  • Visual feature importance
  • Automatic data preprocessing options

Python API

For programmatic control, use the Python API:

from xplainable.core.models import XClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

# Load and prepare data
data = pd.read_csv('data.csv')
X, y = data.drop('target', axis=1), data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = XClassifier()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

# Get explanations
model.explain()

Model Parameters

Core Parameters

ParameterTypeDefaultDescription
max_depthint5Maximum depth of decision trees
min_info_gainfloat0.01Minimum information gain for splits
min_leaf_sizeint5Minimum samples required for leaf nodes
weightfloat0.5Activation function weight parameter
power_degreeint1Power degree for activation function
sigmoid_exponentint1Sigmoid exponent for activation

Advanced Parameters

ParameterTypeDefaultDescription
tail_sensitivityfloat0.5Weight for divisive leaf nodes
ignore_nanboolTrueHandle missing values automatically
map_calibrationboolTrueApply probability calibration mapping

Example with Parameters

model = XClassifier(
max_depth=7,
min_info_gain=0.005,
min_leaf_size=10,
weight=0.7,
power_degree=2,
sigmoid_exponent=1,
tail_sensitivity=0.3,
ignore_nan=True,
map_calibration=True
)

Model Methods

Training Methods

# Basic training
model.fit(X_train, y_train)

# With validation data
model.fit(X_train, y_train, validation_data=(X_val, y_val))

# With sample weights
model.fit(X_train, y_train, sample_weight=weights)

Prediction Methods

# Binary predictions
predictions = model.predict(X_test)

# Probability predictions
probabilities = model.predict_proba(X_test)

# Single sample prediction
single_pred = model.predict(X_test.iloc[[0]])

Explanation Methods

# Global explanations
model.explain()

# Feature importance
importance = model.feature_importance()

# Local explanations for specific samples
model.explain(X_test.iloc[[0]])

# Waterfall plot for decision breakdown
model.waterfall(X_test.iloc[[0]])

Model Inspection

# Get model statistics
stats = model.stats()

# View decision trees for each feature
trees = model.trees()

# Get feature contributions
contributions = model.feature_contributions(X_test)

Advanced Usage

Rapid Refitting

One of xplainable's unique features is the ability to update parameters without complete retraining:

# Initial training
model = XClassifier()
model.fit(X_train, y_train)

# Update parameters rapidly
model.refit(
max_depth=7,
weight=0.8,
features=['feature1', 'feature2'] # Only update specific features
)

# Performance comparison
print(f"Original accuracy: {model.score(X_test, y_test)}")
Rapid Refitting Benefits
  • 10-100x faster than complete retraining
  • Feature-specific updates for granular control
  • Real-time parameter tuning in production
  • A/B testing different configurations

Partitioned Classification

For datasets with natural segments, use PartitionedClassifier:

from xplainable.core.models import PartitionedClassifier, XClassifier

# Create partitioned model
partitioned_model = PartitionedClassifier(partition_on='segment_column')

# Train separate models for each segment
for segment in train['segment_column'].unique():
segment_data = train[train['segment_column'] == segment]
X_seg, y_seg = segment_data.drop('target', axis=1), segment_data['target']

# Train model for this segment
segment_model = XClassifier(
max_depth=5,
min_info_gain=0.01
)
segment_model.fit(X_seg, y_seg)

# Add to partitioned model
partitioned_model.add_partition(segment_model, segment)

# Predict with automatic segment routing
predictions = partitioned_model.predict(X_test)

Surrogate Models

Explain black-box models with transparent surrogates:

from xplainable.core.models import XSurrogateClassifier
from sklearn.ensemble import RandomForestClassifier

# Train black-box model
black_box = RandomForestClassifier()
black_box.fit(X_train, y_train)

# Create transparent surrogate
surrogate = XSurrogateClassifier(
black_box_model=black_box,
max_depth=5,
min_info_gain=0.01
)

# Fit surrogate to explain black-box
surrogate.fit(X_train, y_train)

# Get explanations for black-box predictions
surrogate.explain(X_test)

Hyperparameter Optimization

Automatic Optimization

from xplainable.core.optimisation.bayesian import XParamOptimiser

# Set up optimizer
optimizer = XParamOptimiser(
n_trials=200,
n_folds=5,
early_stopping=40,
objective='roc_auc' # or 'f1', 'precision', 'recall', 'accuracy'
)

# Find optimal parameters
best_params = optimizer.optimise(X_train, y_train)

# Train optimized model
model = XClassifier(**best_params)
model.fit(X_train, y_train)

Custom Search Spaces

from hyperopt import hp

# Define custom search space
search_space = {
'max_depth': hp.choice('max_depth', [3, 4, 5, 6, 7]),
'min_info_gain': hp.uniform('min_info_gain', 0.001, 0.1),
'weight': hp.uniform('weight', 0.1, 0.9),
'power_degree': hp.choice('power_degree', [1, 2, 3])
}

# Optimize with custom space
optimizer = XParamOptimiser(
n_trials=100,
search_space=search_space
)
best_params = optimizer.optimise(X_train, y_train)

Performance Metrics

Built-in Evaluation

# Accuracy score
accuracy = model.score(X_test, y_test)

# Detailed metrics
from xplainable.metrics import classification_metrics
metrics = classification_metrics(y_test, model.predict(X_test))

print(f"Accuracy: {metrics['accuracy']:.3f}")
print(f"Precision: {metrics['precision']:.3f}")
print(f"Recall: {metrics['recall']:.3f}")
print(f"F1-Score: {metrics['f1']:.3f}")
print(f"ROC-AUC: {metrics['roc_auc']:.3f}")

Custom Metrics

from sklearn.metrics import classification_report, confusion_matrix

# Predictions
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)

# Detailed classification report
print(classification_report(y_test, y_pred))

# Confusion matrix
print(confusion_matrix(y_test, y_pred))

Visualization & Explainability

Feature Importance

# Global feature importance
importance = model.feature_importance()
print(importance.head())

# Plot feature importance
model.plot_feature_importance()

Decision Explanations

# Explain specific predictions
sample_explanation = model.explain(X_test.iloc[[0]])

# Waterfall plot showing decision breakdown
model.waterfall(X_test.iloc[[0]])

# Feature contribution analysis
contributions = model.feature_contributions(X_test)

Model Visualization

# Visualize decision trees for each feature
model.plot_trees()

# Show model architecture
model.plot_architecture()

# Performance curves
model.plot_performance_curves(X_test, y_test)

Integration Examples

Scikit-learn Pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Create pipeline with xplainable model
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', XClassifier())
])

# Fit pipeline
pipeline.fit(X_train, y_train)

# Predict with pipeline
predictions = pipeline.predict(X_test)

Cross-validation

from sklearn.model_selection import cross_val_score

# Cross-validation with XClassifier
scores = cross_val_score(
XClassifier(),
X_train,
y_train,
cv=5,
scoring='roc_auc'
)

print(f"CV ROC-AUC: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})")

Production Deployment

Model Persistence

Cloud Deployment

from xplainable_client import Client

# Initialize client
client = Client(api_key="your-api-key")

# Deploy to cloud
model_id, version_id = client.create_model(
model=model,
model_name="Binary Classification Model",
model_description="Transparent binary classifier",
x=X_train,
y=y_train
)

# Deploy as API
deployment = client.deploy(
model_id=model_id,
version_id=version_id,
deployment_name="binary-classifier-api"
)

Best Practices

Data Preparation

Data Quality
  • Handle missing values appropriately (XClassifier can handle NaN automatically)
  • Encode categorical variables using preprocessing pipeline
  • Scale features if using distance-based features
  • Remove highly correlated features for better interpretability

Model Configuration

# Recommended starting parameters
model = XClassifier(
max_depth=5, # Start conservative
min_info_gain=0.01, # Prevent overfitting
min_leaf_size=10, # Ensure statistical significance
weight=0.5, # Balanced activation
map_calibration=True # Better probability estimates
)

Performance Monitoring

# Monitor model performance over time
def monitor_model_performance(model, X_test, y_test):
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)

metrics = {
'accuracy': accuracy_score(y_test, predictions),
'roc_auc': roc_auc_score(y_test, probabilities[:, 1]),
'f1': f1_score(y_test, predictions)
}

return metrics

# Regular performance checks
performance = monitor_model_performance(model, X_test, y_test)

Common Use Cases

🏦 Financial Services

  • Credit scoring and risk assessment
  • Fraud detection with explainable decisions
  • Regulatory compliance (Basel III, GDPR)

🏥 Healthcare

  • Clinical decision support
  • Patient risk stratification
  • Medical diagnosis assistance

🛒 E-commerce

  • Customer churn prediction
  • Product recommendation systems
  • Marketing campaign optimization

🏭 Manufacturing

  • Quality control and defect detection
  • Predictive maintenance
  • Process optimization

Troubleshooting

Common Issues

Model not fitting properly

Possible causes:

  • Insufficient data for the complexity
  • Highly imbalanced classes
  • Poor feature quality

Solutions:

  • Reduce max_depth or increase min_leaf_size
  • Use class weights or resampling
  • Improve feature engineering
Poor probability calibration

Solutions:

  • Ensure map_calibration=True
  • Use larger training dataset
  • Consider probability calibration post-processing
Slow training performance

Solutions:

  • Reduce max_depth parameter
  • Increase min_info_gain threshold
  • Use feature selection to reduce dimensionality

Next Steps

Ready to Explore?