Skip to main content
Version: v1.4.1

Preprocessing

Create and manage data preprocessing pipelines.

All methods are accessed via client.preprocessing.

create_preprocessor()

POST/v1/preprocessors/create

Create a new preprocessor from a PipelineSpec dict.

Parameters

namestrRequired
Name of the preprocessor
descriptionstrRequired
Description of the preprocessor
specdictRequired
PipelineSpec dict ({"version": "2.0", "steps": [...]})
sample_dfDataFramedefault: None
Optional sample dataframe for fitting

Returns

tuple — Tuple of (preprocessor_id, version_id)

Example

1result = client.preprocessing.create_preprocessor(
2 name="My Resource",
3 description="A description",
4 spec={"version": "2.0", "steps": []},
5 sample_df=df
6)

create_from_pipeline()

POST/v1/preprocessors/create

Create a preprocessor from a fitted DataFramePipeline.

Parameters

namestrRequired
Name of the preprocessor
descriptionstrRequired
Description of the preprocessor
pipelineRequired
A fitted DataFramePipeline instance
dfDataFrameRequired
Sample dataframe (used for schema inference)

Returns

tuple — Tuple of (preprocessor_id, version_id)

Example

1result = client.preprocessing.create_from_pipeline(
2 name="My Resource",
3 description="A description",
4 pipeline=pipeline,
5 df=df
6)

add_version()

POST/v1/preprocessors/add

Add a new version to an existing preprocessor.

Parameters

preprocessor_idstrRequired
ID of the existing preprocessor
specdictRequired
PipelineSpec dict
sample_dfDataFramedefault: None
Optional sample dataframe for fitting
parent_version_idstrdefault: None
Optional parent version for lineage tracking

Returns

str — The new version_id

Example

1result = client.preprocessing.add_version(
2 preprocessor_id="pp_abc123",
3 spec={"version": "2.0", "steps": []},
4 sample_df=df,
5 parent_version_id="..."
6)

update_version()

POST/v1/preprocessors/update-version

Update an existing preprocessor version with a new spec.

Parameters

version_idstrRequired
ID of the version to update
specdictRequired
Updated PipelineSpec dict
sample_dfDataFramedefault: None
Optional sample dataframe for re-fitting

Returns

str — The updated version_id

Example

1result = client.preprocessing.update_version(
2 version_id="version_xyz789",
3 spec={"version": "2.0", "steps": []},
4 sample_df=df
5)

get_version()

GET/v1/preprocessors/versions/{version_id}

Get metadata for a preprocessor version.

Parameters

version_idstrRequired
The version ID

Returns

dict — Version info dict with spec, schemas, etc.

Example

1result = client.preprocessing.get_version(
2 version_id="version_xyz789"
3)

load_pipeline()

Load a fitted pipeline ready to .transform().

Parameters

version_idstrRequired
The version ID to load

Returns

A fitted DataFramePipeline instance

Example

1result = client.preprocessing.load_pipeline(
2 version_id="version_xyz789"
3)

fit_version()

POST/v1/preprocessors/versions/{version_id}/fit

Fit a preprocessor version with sample data.

Parameters

version_idstrRequired
The version ID to fit
dfDataFrameRequired
Dataframe to fit on

Returns

dict — Fit result dict with schemas and status

Example

1result = client.preprocessing.fit_version(
2 version_id="version_xyz789",
3 df=df
4)

preview()

POST/v1/preprocessors/versions/{version_id}/preview

Preview pipeline transformation on sample data.

Parameters

version_idstrRequired
The version ID to preview
dfDataFrameRequired
Sample dataframe

Returns

dict — Preview dict with deltas, schemas, and samples

Example

1result = client.preprocessing.preview(
2 version_id="version_xyz789",
3 df=df
4)

list_preprocessors()

GET/v1/preprocessors/teams/{team_id}

List all preprocessors for a team.

Parameters

team_idstrdefault: None
Optional team ID (uses session team_id if not provided)

Returns

list — List of preprocessor information

Example

1result = client.preprocessing.list_preprocessors(
2 team_id="team_abc123"
3)

get_preprocessor()

GET/v1/preprocessors/{preprocessor_id}

Get detailed information about a preprocessor.

Parameters

preprocessor_idstrRequired
ID of the preprocessor

Returns

PreprocessorInfo — Preprocessor information

Example

1result = client.preprocessing.get_preprocessor(
2 preprocessor_id="pp_abc123"
3)

check_signature()

POST/v1/preprocessors/check-signature

Check if a preprocessor version's output schema matches expected columns.

Parameters

version_idstrRequired
The version ID to check
columnslistRequired
Expected output column names

Returns

dict — Signature check result dict

Example

1result = client.preprocessing.check_signature(
2 version_id="version_xyz789",
3 columns=["col1", "col2"]
4)

delete_version()

DELETE/v1/preprocessors/versions/{version_id}

Delete a preprocessor version.

Parameters

version_idstrRequired
The version ID to delete

Returns

dict — Deletion result dict

Example

1result = client.preprocessing.delete_version(
2 version_id="version_xyz789"
3)

create_preprocessor_from_spec()

POST/v1/preprocessors/create

Create a new preprocessor from a PipelineSpec dict. The spec should follow the PipelineSpec format: {"version": "2.0", "steps": [{"id": "...", "type": "...", "columns": [...], "params": {...}}]} Use preprocessing_list_available_transformers to see available transformer types and their parameters.

Parameters

namestrRequired
Name of the preprocessor
descriptionstrRequired
Description of the preprocessor
specdictRequired
PipelineSpec dict
sample_datalistdefault: None
Optional sample data as a list of row dicts (JSON records)

Returns

dict — Dict with preprocessor_id and version_id

Example

1result = client.preprocessing.create_preprocessor_from_spec(
2 name="My Resource",
3 description="A description",
4 spec={"version": "2.0", "steps": []},
5 sample_data=[{"col1": 1, "col2": "a"}]
6)

add_version_from_spec()

POST/v1/preprocessors/add

Add a new version to an existing preprocessor.

Parameters

preprocessor_idstrRequired
ID of the existing preprocessor
specdictRequired
PipelineSpec dict
sample_datalistdefault: None
Optional sample data as a list of row dicts (JSON records)
parent_version_idstrdefault: None
Optional parent version for lineage tracking

Returns

dict — Dict with version_id

Example

1result = client.preprocessing.add_version_from_spec(
2 preprocessor_id="pp_abc123",
3 spec={"version": "2.0", "steps": []},
4 sample_data=[{"col1": 1, "col2": "a"}],
5 parent_version_id="..."
6)

update_version_from_spec()

PUT/v1/preprocessors/update-version

Update an existing preprocessor version with a new spec.

Parameters

version_idstrRequired
ID of the version to update
specdictRequired
Updated PipelineSpec dict
sample_datalistdefault: None
Optional sample data as a list of row dicts (JSON records)

Returns

dict — Dict with version_id

Example

1result = client.preprocessing.update_version_from_spec(
2 version_id="version_xyz789",
3 spec={"version": "2.0", "steps": []},
4 sample_data=[{"col1": 1, "col2": "a"}]
5)

fit_version_from_data()

GET/v1/preprocessors/versions/{version_id}/fit

Fit a preprocessor version with sample data.

Parameters

version_idstrRequired
The version ID to fit
sample_datalistRequired
Sample data as a list of row dicts (JSON records)

Returns

dict — Fit result dict with schemas and status

Example

1result = client.preprocessing.fit_version_from_data(
2 version_id="version_xyz789",
3 sample_data=[{"col1": 1, "col2": "a"}]
4)

preview_from_data()

GET/v1/preprocessors/versions/{version_id}/preview

Preview pipeline transformation on sample data.

Parameters

version_idstrRequired
The version ID to preview
sample_datalistRequired
Sample data as a list of row dicts (JSON records)

Returns

dict — Preview dict with deltas, schemas, and samples

Example

1result = client.preprocessing.preview_from_data(
2 version_id="version_xyz789",
3 sample_data=[{"col1": 1, "col2": "a"}]
4)

list_available_transformers()

List all available preprocessing transformers with their parameters. Returns a catalog of transformer types that can be used in PipelineSpec steps, including their constructor parameters and descriptions.

Returns

str — Formatted string describing all available transformers

Example

1result = client.preprocessing.list_available_transformers()

delete_preprocessor()

DELETE/v1/preprocessors/{preprocessor_id}

Delete a preprocessor and all its versions.

Parameters

preprocessor_idstrRequired
The preprocessor ID to delete

Returns

dict — Deletion result dict

Example

1result = client.preprocessing.delete_preprocessor(
2 preprocessor_id="pp_abc123"
3)