Custom Transformers
Custom transformers allow you to write sklearn-compatible transformer classes and use them within xplainable-preprocessing pipelines. This enables domain-specific preprocessing while leveraging the full pipeline compilation and validation system.
Overview
The xplainable-preprocessing package provides a declarative pipeline system where preprocessing steps are defined as StepSpec objects and compiled into executable pipelines. When no built-in transformer type fits your needs, you can write a custom transformer using type="custom".
Custom transformers are:
- Validated at compile time via AST-based code analysis to prevent unsafe operations
- Executed in a restricted namespace for security
- Fully compatible with the pipeline specification, column contracts, and mutation system
How Custom Transformers Work
Custom transformers use the StepSpec with type="custom". The params dictionary must contain:
code(str): The Python source code defining the transformer classclass_name(str): The name of the class to instantiate from the code
Any additional keys in params are passed as constructor keyword arguments to the class.
Writing Custom Transformer Classes
Required Interface
Custom transformer classes must implement the sklearn transformer interface:
Code Validation Rules
The xplainable-preprocessing sandbox validates custom code using AST analysis before execution. The following restrictions apply:
Forbidden imports -- these modules cannot be imported:
os, sys, subprocess, socket, shutil, pathlib, http, urllib, requests, importlib, ctypes, signal, multiprocessing, threading, asyncio, pickle, shelve, tempfile, glob, io, builtins, code, codeop
Forbidden function calls -- these functions cannot be called:
exec, eval, __import__, open, compile, globals, locals, breakpoint, exit, quit, getattr, setattr, delattr
Allowed imports include standard data science libraries:
numpy, pandas, sklearn, scipy, math, re, json, collections, itertools, functools, datetime, decimal, statistics, and others not in the forbidden list.
Available builtins in the restricted namespace include:
True, False, None, int, float, str, bool, list, dict, tuple, set, frozenset, range, enumerate, zip, map, filter, sorted, reversed, len, min, max, sum, abs, round, isinstance, issubclass, type, super, property, staticmethod, classmethod, print, repr, hasattr, any, all, and common exception types.
Examples
Outlier Capper
Cap values at specified percentiles to handle outliers:
Feature Interaction Creator
Create interaction features from existing columns:
Cyclic Time Encoder
Encode cyclical time features (e.g., hour of day, day of week) using sine and cosine:
Combining Custom and Built-in Steps
Custom transformers work alongside the built-in transformer registry. A pipeline can mix both:
Column Scoping
When columns is specified on a StepSpec, the transformer is wrapped in a DataFrameColumnTransformer that applies the transformation only to those columns, leaving others untouched:
Pipeline Validation
Before compilation, you can validate your pipeline specification:
Custom code is further validated via AST analysis during compile_spec() to ensure no forbidden imports or calls are present.
Best Practices
- Always return
selffromfit()to support method chaining and pipeline compatibility. - Always copy the input DataFrame in
transform()withX.copy()to avoid modifying the original data. - Handle missing values -- check for NaN/None in your transformation logic.
- Keep code self-contained -- avoid relying on external state or files. Only import from allowed modules.
- Use constructor parameters for configuration rather than hardcoding values, so they can be adjusted via
params. - Test your transformer independently before embedding it in a pipeline specification.
Custom transformer code runs in a restricted sandbox. Do not attempt to use file I/O, network access, subprocess execution, or other system-level operations. The AST validator will reject code containing forbidden imports or function calls.
Next Steps
- Explore rapid refitting for real-time model updates
- Learn about partitioned models for segment-specific modeling
- Check out XEvolutionaryNetwork for advanced weight optimization