Preprocessing
Create and manage data preprocessing pipelines.
All methods are accessed via client.preprocessing.
create_preprocessor()
POST /v1/preprocessors/create
Create a new preprocessor from a PipelineSpec dict.
Parameters
name str Required
Name of the preprocessor
description str Required
Description of the preprocessor
spec dict Required
PipelineSpec dict ({"version": "2.0", "steps": [...]})
sample_df DataFrame default: None
Optional sample dataframe for fitting
Returns
tuple — Tuple of (preprocessor_id, version_id)
Example
1 result = client . preprocessing . create_preprocessor (
2 name = "My Resource" ,
3 description = "A description" ,
4 spec = { "version" : "2.0" , "steps" : [ ] } ,
5 sample_df = df
6 )
create_from_pipeline()
POST /v1/preprocessors/create
Create a preprocessor from a fitted DataFramePipeline.
Parameters
name str Required
Name of the preprocessor
description str Required
Description of the preprocessor
pipeline Required
A fitted DataFramePipeline instance
df DataFrame Required
Sample dataframe (used for schema inference)
Returns
tuple — Tuple of (preprocessor_id, version_id)
Example
1 result = client . preprocessing . create_from_pipeline (
2 name = "My Resource" ,
3 description = "A description" ,
4 pipeline = pipeline ,
5 df = df
6 )
add_version()
POST /v1/preprocessors/add
Add a new version to an existing preprocessor.
Parameters
preprocessor_id str Required
ID of the existing preprocessor
spec dict Required
PipelineSpec dict
sample_df DataFrame default: None
Optional sample dataframe for fitting
parent_version_id str default: None
Optional parent version for lineage tracking
Returns
str — The new version_id
Example
1 result = client . preprocessing . add_version (
2 preprocessor_id = "pp_abc123" ,
3 spec = { "version" : "2.0" , "steps" : [ ] } ,
4 sample_df = df ,
5 parent_version_id = "..."
6 )
update_version()
POST /v1/preprocessors/update-version
Update an existing preprocessor version with a new spec.
Parameters
version_id str Required
ID of the version to update
spec dict Required
Updated PipelineSpec dict
sample_df DataFrame default: None
Optional sample dataframe for re-fitting
Returns
str — The updated version_id
Example
1 result = client . preprocessing . update_version (
2 version_id = "version_xyz789" ,
3 spec = { "version" : "2.0" , "steps" : [ ] } ,
4 sample_df = df
5 )
get_version()
GET /v1/preprocessors/versions/ {version_id}
Get metadata for a preprocessor version.
Parameters
version_id str Required
The version ID
Returns
dict — Version info dict with spec, schemas, etc.
Example
1 result = client . preprocessing . get_version (
2 version_id = "version_xyz789"
3 )
load_pipeline()
Load a fitted pipeline ready to .transform().
Parameters
version_id str Required
The version ID to load
Returns
A fitted DataFramePipeline instance
Example
1 result = client . preprocessing . load_pipeline (
2 version_id = "version_xyz789"
3 )
fit_version()
POST /v1/preprocessors/versions/ {version_id} /fit
Fit a preprocessor version with sample data.
Parameters
version_id str Required
The version ID to fit
df DataFrame Required
Dataframe to fit on
Returns
dict — Fit result dict with schemas and status
Example
1 result = client . preprocessing . fit_version (
2 version_id = "version_xyz789" ,
3 df = df
4 )
preview()
POST /v1/preprocessors/versions/ {version_id} /preview
Preview pipeline transformation on sample data.
Parameters
version_id str Required
The version ID to preview
df DataFrame Required
Sample dataframe
Returns
dict — Preview dict with deltas, schemas, and samples
Example
1 result = client . preprocessing . preview (
2 version_id = "version_xyz789" ,
3 df = df
4 )
list_preprocessors()
GET /v1/preprocessors/teams/ {team_id}
List all preprocessors for a team.
Parameters
team_id str default: None
Optional team ID (uses session team_id if not provided)
Returns
list — List of preprocessor information
Example
1 result = client . preprocessing . list_preprocessors (
2 team_id = "team_abc123"
3 )
get_preprocessor()
GET /v1/preprocessors/ {preprocessor_id}
Get detailed information about a preprocessor.
Parameters
preprocessor_id str Required
ID of the preprocessor
Returns
PreprocessorInfo — Preprocessor information
Example
1 result = client . preprocessing . get_preprocessor (
2 preprocessor_id = "pp_abc123"
3 )
check_signature()
POST /v1/preprocessors/check-signature
Check if a preprocessor version's output schema matches expected columns.
Parameters
version_id str Required
The version ID to check
columns list Required
Expected output column names
Returns
dict — Signature check result dict
Example
1 result = client . preprocessing . check_signature (
2 version_id = "version_xyz789" ,
3 columns = [ "col1" , "col2" ]
4 )
delete_version()
DELETE /v1/preprocessors/versions/ {version_id}
Delete a preprocessor version.
Parameters
version_id str Required
The version ID to delete
Returns
dict — Deletion result dict
Example
1 result = client . preprocessing . delete_version (
2 version_id = "version_xyz789"
3 )
create_preprocessor_from_spec()
POST /v1/preprocessors/create
Create a new preprocessor from a PipelineSpec dict. The spec should follow the PipelineSpec format: {"version": "2.0", "steps": [{"id": "...", "type": "...", "columns": [...], "params": {...}}]} Use preprocessing_list_available_transformers to see available transformer types and their parameters.
Parameters
name str Required
Name of the preprocessor
description str Required
Description of the preprocessor
spec dict Required
PipelineSpec dict
sample_data list default: None
Optional sample data as a list of row dicts (JSON records)
Returns
dict — Dict with preprocessor_id and version_id
Example
1 result = client . preprocessing . create_preprocessor_from_spec (
2 name = "My Resource" ,
3 description = "A description" ,
4 spec = { "version" : "2.0" , "steps" : [ ] } ,
5 sample_data = [ { "col1" : 1 , "col2" : "a" } ]
6 )
add_version_from_spec()
POST /v1/preprocessors/add
Add a new version to an existing preprocessor.
Parameters
preprocessor_id str Required
ID of the existing preprocessor
spec dict Required
PipelineSpec dict
sample_data list default: None
Optional sample data as a list of row dicts (JSON records)
parent_version_id str default: None
Optional parent version for lineage tracking
Returns
dict — Dict with version_id
Example
1 result = client . preprocessing . add_version_from_spec (
2 preprocessor_id = "pp_abc123" ,
3 spec = { "version" : "2.0" , "steps" : [ ] } ,
4 sample_data = [ { "col1" : 1 , "col2" : "a" } ] ,
5 parent_version_id = "..."
6 )
update_version_from_spec()
PUT /v1/preprocessors/update-version
Update an existing preprocessor version with a new spec.
Parameters
version_id str Required
ID of the version to update
spec dict Required
Updated PipelineSpec dict
sample_data list default: None
Optional sample data as a list of row dicts (JSON records)
Returns
dict — Dict with version_id
Example
1 result = client . preprocessing . update_version_from_spec (
2 version_id = "version_xyz789" ,
3 spec = { "version" : "2.0" , "steps" : [ ] } ,
4 sample_data = [ { "col1" : 1 , "col2" : "a" } ]
5 )
fit_version_from_data()
GET /v1/preprocessors/versions/ {version_id} /fit
Fit a preprocessor version with sample data.
Parameters
version_id str Required
The version ID to fit
sample_data list Required
Sample data as a list of row dicts (JSON records)
Returns
dict — Fit result dict with schemas and status
Example
1 result = client . preprocessing . fit_version_from_data (
2 version_id = "version_xyz789" ,
3 sample_data = [ { "col1" : 1 , "col2" : "a" } ]
4 )
preview_from_data()
GET /v1/preprocessors/versions/ {version_id} /preview
Preview pipeline transformation on sample data.
Parameters
version_id str Required
The version ID to preview
sample_data list Required
Sample data as a list of row dicts (JSON records)
Returns
dict — Preview dict with deltas, schemas, and samples
Example
1 result = client . preprocessing . preview_from_data (
2 version_id = "version_xyz789" ,
3 sample_data = [ { "col1" : 1 , "col2" : "a" } ]
4 )
List all available preprocessing transformers with their parameters. Returns a catalog of transformer types that can be used in PipelineSpec steps, including their constructor parameters and descriptions.
Returns
str — Formatted string describing all available transformers
Example
1 result = client . preprocessing . list_available_transformers ( )
delete_preprocessor()
DELETE /v1/preprocessors/ {preprocessor_id}
Delete a preprocessor and all its versions.
Parameters
preprocessor_id str Required
The preprocessor ID to delete
Returns
dict — Deletion result dict
Example
1 result = client . preprocessing . delete_preprocessor (
2 preprocessor_id = "pp_abc123"
3 )