wsidata.dataset.FeaturesDatasetBuilder#
- class FeaturesDatasetBuilder(stores, splits=None, tile_key=None, feature_key=None, target_key=None, target_transform=None, skip_class=None, sampler='undersample', n_per_class=None, in_memory=True, seed=0, targets_mapping=None)#
Bases:
DatasetBuilderBuild train and test dataset from multiple slides.
The train/val/test split is guaranteed with no overlap between slides.
Note
This is still a provisional API, may change in the future without notice.
- Parameters:
- storeslist
A list of paths pointing to the WSIData stores, must be .zarr file
- tile_keystr
The key of the tile table.
- feature_keystr
The key for the tile features.
- target_keystr
The key for the Y variable in the observation table.
- skip_classlist
The classes to skip.
- sampler: {‘no-balance’, ‘undersample’}
How to sample the tile features for dataset construction. The no-balance option will return all the tiles for each class. The undersample option will ensure that each class has the same number of tiles.
- n_per_class: int
The number of tiles to sample for each class.
- in_memory: bool
If True, load the dataset into memory. If False, an IterableDataset will be returned.
- targets_mapping: dict[str, int] | None
Optional explicit mapping from class label to integer code to use across train/val/test splits. If not provided, a deterministic mapping will be derived from the observed labels (sorted by label name).
- property targets_mapping#
Mapping from class label to integer code used in all datasets.