Accessors for WSIData#
The accessor is a concept that use attributes to extend the capabilities of a class.
There are three in-built accessors in the WSIData class:
fetch: Fetch information about the WSIiter: Iterate over the content of the WSIds: Create deep learning datasets from the WSI
Here, we will load a WSI that have already been processed with tissue detection, tissue tiling, and feature extraction.
In your case, you can easily run these steps with LazySlide package.
from pathlib import Path
from zipfile import ZipFile
from huggingface_hub import hf_hub_download
slide = hf_hub_download(
"RendeiroLab/LazySlide-data", "GTEX-1117F-0526.svs", repo_type="dataset"
)
slide_zarr_zip = hf_hub_download(
"RendeiroLab/LazySlide-data", "GTEX-1117F-0526.zarr.zip", repo_type="dataset"
)
if not Path(slide_zarr_zip.replace(".zip", "")).exists():
with ZipFile(slide_zarr_zip, "r") as zip_ref:
zip_ref.extractall(Path(slide).parent)
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
from wsidata import open_wsi
wsi = open_wsi(slide)
wsi
Reader: openslide
Dimensions: 19958×19919 (h×w), 3 Pyramids
Pixel physical size: 0.49 MPP (20X)
SpatialData object, with associated Zarr store: /home/docs/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/d469afd4a763ad366861e8c49d4cf424bfad902c/GTEX-1117F-0526.zarr
├── Shapes
│ ├── 'annotations': GeoDataFrame shape: (14, 5) (2D shapes)
│ ├── 'dl-tissue': GeoDataFrame shape: (2, 2) (2D shapes)
│ ├── 'tiles': GeoDataFrame shape: (253, 3) (2D shapes)
│ └── 'tissues': GeoDataFrame shape: (2, 2) (2D shapes)
└── Tables
├── 'resnet50_tiles': AnnData (253, 2048)
└── 'uni2_tiles': AnnData (253, 1536)
with coordinate systems:
▸ 'global', with elements:
annotations (Shapes), dl-tissue (Shapes), tiles (Shapes), tissues (Shapes)
Fetch accessor#
Fetch accessor allows you to fetch essential information from WSIData.
1. Pyramids information#
wsi.fetch.pyramids()
| height | width | downsample | |
|---|---|---|---|
| level | |||
| 0 | 19958 | 19919 | 1.000000 |
| 1 | 4989 | 4979 | 4.000502 |
| 2 | 2494 | 2489 | 8.002609 |
2. Retrive the features as AnnData#
wsi.fetch.features_anndata("resnet50")
AnnData object with n_obs × n_vars = 253 × 2048
obs: 'tile_id', 'tissue_id'
uns: 'tile_spec', 'slide_properties'
obsm: 'spatial'
Iter accessor#
Like the name, the iter accessor will always return an iterator, and the iterator will always return data containers.
The data container usually implements a plot method for inspection.
1. Tissue contours#
d = next(wsi.iter.tissue_contours("tissues"))
d
Attributes:
tissue_id: 0
shape
contour
holes
x: 3793
y: 13804
width: 3441
height: 4161
It’s also possible to visualize what’s inside.
You can use a for loop to iterate every tissue
for d in wsi.iter.tissue_contours("tissues"):
d.contour
2. Tissue images#
Iterate through tissue images
no_mask = next(wsi.iter.tissue_images("tissues"))
with_mask = next(wsi.iter.tissue_images("tissues", mask_bg=True))
3. Tile images#
You can also iterate over all tile images
d = next(wsi.iter.tile_images("tiles"))
d
Attributes:
id: 0
x: 3793
y: 16394
width: 256
height: 256
downsample: 1.01171875
tissue_id: 0
image
annot_mask
annot_shapes
annot_labels
norm_annot_shapes
has_annot: False
You can include pathological annotations, this is useful to prepare dataset for training segmentation model.
Dataset accessor#
dataset = wsi.ds.tile_images()
The dataset is a torch dataset that can be used to train a deep learning model. You can load it in the DataLoader and train the model.
from torch.utils.data import DataLoader
dl = DataLoader(dataset, batch_size=36, shuffle=True)