wsidata.dataset.FeaturesDatasetBuilder#

class FeaturesDatasetBuilder(stores, splits=None, tile_key=None, feature_key=None, target_key=None, target_transform=None, skip_class=None, sampler='undersample', n_per_class=None, in_memory=True, seed=0, targets_mapping=None)#

Bases: DatasetBuilder

Build train and test dataset from multiple slides.

The train/val/test split is guaranteed with no overlap between slides.

Note

This is still a provisional API, may change in the future without notice.

Parameters:
storeslist

A list of paths pointing to the WSIData stores, must be .zarr file

tile_keystr

The key of the tile table.

feature_keystr

The key for the tile features.

target_keystr

The key for the Y variable in the observation table.

skip_classlist

The classes to skip.

sampler: {‘no-balance’, ‘undersample’}

How to sample the tile features for dataset construction. The no-balance option will return all the tiles for each class. The undersample option will ensure that each class has the same number of tiles.

n_per_class: int

The number of tiles to sample for each class.

in_memory: bool

If True, load the dataset into memory. If False, an IterableDataset will be returned.

targets_mapping: dict[str, int] | None

Optional explicit mapping from class label to integer code to use across train/val/test splits. If not provided, a deterministic mapping will be derived from the observed labels (sorted by label name).

property targets_mapping#

Mapping from class label to integer code used in all datasets.