Accessors for WSIData#

The accessor is a concept that use attributes to extend the capabilities of a class.

There are three in-built accessors in the WSIData class:

  • fetch: Fetch information about the WSI

  • iter: Iterate over the content of the WSI

  • ds: Create deep learning datasets from the WSI

Here, we will load a WSI that have already been processed with tissue detection, tissue tiling, and feature extraction.

In your case, you can easily run these steps with LazySlide package.

from pathlib import Path
from zipfile import ZipFile

from huggingface_hub import hf_hub_download

slide = hf_hub_download(
    "RendeiroLab/LazySlide-data", "GTEX-1117F-0526.svs", repo_type="dataset"
)
slide_zarr_zip = hf_hub_download(
    "RendeiroLab/LazySlide-data", "GTEX-1117F-0526.zarr.zip", repo_type="dataset"
)
if not Path(slide_zarr_zip.replace(".zip", "")).exists():
    with ZipFile(slide_zarr_zip, "r") as zip_ref:
        zip_ref.extractall(Path(slide).parent)
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
from wsidata import open_wsi

wsi = open_wsi(slide)
wsi
WSI: /home/docs/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/d469afd4a763ad366861e8c49d4cf424bfad902c/GTEX-1117F-0526.svs
Reader: openslide
Dimensions: 19958×19919 (h×w), 3 Pyramids
Pixel physical size: 0.49 MPP (20X)
SpatialData object, with associated Zarr store: /home/docs/.cache/huggingface/hub/datasets--RendeiroLab--LazySlide-data/snapshots/d469afd4a763ad366861e8c49d4cf424bfad902c/GTEX-1117F-0526.zarr
├── Shapes
│     ├── 'annotations': GeoDataFrame shape: (14, 5) (2D shapes)
│     ├── 'dl-tissue': GeoDataFrame shape: (2, 2) (2D shapes)
│     ├── 'tiles': GeoDataFrame shape: (253, 3) (2D shapes)
│     └── 'tissues': GeoDataFrame shape: (2, 2) (2D shapes)
└── Tables
      ├── 'resnet50_tiles': AnnData (253, 2048)
      └── 'uni2_tiles': AnnData (253, 1536)
with coordinate systems:
    ▸ 'global', with elements:
        annotations (Shapes), dl-tissue (Shapes), tiles (Shapes), tissues (Shapes)

Fetch accessor#

Fetch accessor allows you to fetch essential information from WSIData.

1. Pyramids information#

wsi.fetch.pyramids()
height width downsample
level
0 19958 19919 1.000000
1 4989 4979 4.000502
2 2494 2489 8.002609

2. Retrive the features as AnnData#

wsi.fetch.features_anndata("resnet50")
AnnData object with n_obs × n_vars = 253 × 2048
    obs: 'tile_id', 'tissue_id'
    uns: 'tile_spec', 'slide_properties'
    obsm: 'spatial'

Iter accessor#

Like the name, the iter accessor will always return an iterator, and the iterator will always return data containers.

The data container usually implements a plot method for inspection.

1. Tissue contours#

d = next(wsi.iter.tissue_contours("tissues"))
d
TissueContour

Attributes:

tissue_id: 0

shape

contour

holes

x: 3793

y: 13804

width: 3441

height: 4161

It’s also possible to visualize what’s inside.

d.plot()
<Axes: >
../_images/66509ef26608d29ec95ebc8b6d21c0733e6eef26697aa3ea0f7b30f789124174.png

You can use a for loop to iterate every tissue

for d in wsi.iter.tissue_contours("tissues"):
    d.contour

2. Tissue images#

Iterate through tissue images

no_mask = next(wsi.iter.tissue_images("tissues"))
with_mask = next(wsi.iter.tissue_images("tissues", mask_bg=True))
import matplotlib.pyplot as plt

_, (ax1, ax2) = plt.subplots(ncols=2)

no_mask.plot(ax=ax1)
with_mask.plot(ax=ax2)
<Axes: >
../_images/8d65a3e4e2f0d56774c2aed1d71af8ec8670e9485ab70224879ac78ecc5ae1c2.png

3. Tile images#

You can also iterate over all tile images

d = next(wsi.iter.tile_images("tiles"))
d
TileImage

Attributes:

id: 0

x: 3793

y: 16394

width: 256

height: 256

downsample: 1.01171875

tissue_id: 0

image

annot_mask

annot_shapes

annot_labels

norm_annot_shapes

has_annot: False

You can include pathological annotations, this is useful to prepare dataset for training segmentation model.

for d in wsi.iter.tile_images(
    "tiles", annot_key="annotations", annot_names="name", annot_labels={"sclerosis": 1}
):
    if len(d.annot_shapes) > 1:
        d.plot()
        break

Dataset accessor#

dataset = wsi.ds.tile_images()

The dataset is a torch dataset that can be used to train a deep learning model. You can load it in the DataLoader and train the model.

from torch.utils.data import DataLoader

dl = DataLoader(dataset, batch_size=36, shuffle=True)