You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: bring idc_dataset notebook into compliance with contribution guidelines
- Fix license header: use MONAI Consortium copyright, correct format with
trailing double spaces and indentation, moved to top of first cell
- Move all imports (os, sys, itkwasm_dicom) into Setup imports cell; simplify
Setup environment cell to pip install only
- Add README.md entry for idc_dataset under Modules section
- Add idc_dataset to doesnt_contain_max_epochs and skip_run_papermill in runner.sh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Andrey Fedorov <andrey.fedorov@gmail.com>
This notebook shows the usage of several postprocessing transforms based on the model output of spleen segmentation task.
334
+
##### [idc_dataset](./modules/idc_dataset.ipynb)
335
+
This notebook shows how to query and download public cancer imaging data from NCI Imaging Data Commons (IDC) using `idc-index`, and how to load DICOM images and DICOM-SEG segmentations into MONAI for AI/ML preprocessing.
@@ -386,4 +388,4 @@ Example shows the use cases of using MONAI to evaluate the performance of a gene
386
388
387
389
#### [VISTA2D](./vista_2d)
388
390
This tutorial demonstrates how to train a cell segmentation model using the [MONAI](https://monai.io/) framework and the [Segment Anything Model (SAM)](https://github.com/facebookresearch/segment-anything) on the [Cellpose dataset](https://www.cellpose.org/).
Copy file name to clipboardExpand all lines: modules/idc_dataset.ipynb
+6-208Lines changed: 6 additions & 208 deletions
Original file line number
Diff line number
Diff line change
@@ -5,58 +5,14 @@
5
5
"metadata": {
6
6
"id": "eFLP44iEFCpB"
7
7
},
8
-
"source": [
9
-
"[](https://colab.research.google.com/github/ImagingDataCommons/idc-monai/blob/main/monai_contribution/idc_dataset.ipynb)\n",
10
-
"\n",
11
-
"# Using NCI Imaging Data Commons with MONAI\n",
12
-
"\n",
13
-
"Copyright 2026 Imaging Data Commons\n",
14
-
"\n",
15
-
"Licensed under the Apache License, Version 2.0 (the \"License\");\n",
16
-
"you may not use this file except in compliance with the License.\n",
17
-
"You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\n",
18
-
"\n",
19
-
"---\n",
20
-
"\n",
21
-
"## What is IDC?\n",
22
-
"\n",
23
-
"[NCI Imaging Data Commons (IDC)](https://portal.imaging.datacommons.cancer.gov/) is a free, cloud-hosted repository of publicly available cancer imaging data maintained by the National Cancer Institute (NCI). It provides:\n",
24
-
"\n",
25
-
"- **~100 TB** of radiology (CT, MR, PET) and pathology images across 160+ cancer collections\n",
26
-
"- **No sign-up or authentication required** — data is openly accessible\n",
27
-
"- **Expert and AI-generated annotations** (e.g., organ segmentations) paired with images\n",
28
-
"- **Standardized format** — all data uses DICOM, the medical imaging industry standard\n",
29
-
"- **Cloud-native storage** — data lives in Google Cloud Storage (GCS) buckets, so downloads are fast\n",
30
-
"- **Accompanying tools** - you can search, visualize, and subset the data\n",
31
-
"\n",
32
-
"## What is `idc-index`?\n",
33
-
"\n",
34
-
"[`idc-index`](https://github.com/ImagingDataCommons/idc-index) is a lightweight Python package that lets you search and download IDC data without any cloud account or special credentials. It ships with a local metadata index — a set of DuckDB tables describing every image series in IDC — so you can run SQL queries locally to find exactly the data you need before downloading anything.\n",
35
-
"\n",
36
-
"## What this tutorial covers\n",
37
-
"\n",
38
-
"This tutorial shows how to:\n",
39
-
"1. Query IDC metadata with SQL to find cancer imaging data\n",
40
-
"2. Download DICOM images and segmentations with one function call\n",
41
-
"3. Load the data into MONAI for AI/ML preprocessing\n",
42
-
"4. Work with DICOM Segmentation (DICOM-SEG) objects and their rich metadata\n",
43
-
"\n",
44
-
"> **Tip**: This tutorial was created using [idc-claude-skill](https://github.com/ImagingDataCommons/idc-claude-skill) — an AI assistant skill for navigating IDC data and the `idc-index` API."
45
-
]
8
+
"source": "Copyright (c) MONAI Consortium \nLicensed under the Apache License, Version 2.0 (the \"License\"); \nyou may not use this file except in compliance with the License. \nYou may obtain a copy of the License at \n http://www.apache.org/licenses/LICENSE-2.0 \nUnless required by applicable law or agreed to in writing, software \ndistributed under the License is distributed on an \"AS IS\" BASIS, \nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. \nSee the License for the specific language governing permissions and \nlimitations under the License.\n\n[](https://colab.research.google.com/github/ImagingDataCommons/idc-monai/blob/main/monai_contribution/idc_dataset.ipynb)\n\n# Using NCI Imaging Data Commons with MONAI\n\n## What is IDC?\n\n[NCI Imaging Data Commons (IDC)](https://portal.imaging.datacommons.cancer.gov/) is a free, cloud-hosted repository of publicly available cancer imaging data maintained by the National Cancer Institute (NCI). It provides:\n\n- **~100 TB** of radiology (CT, MR, PET) and pathology images across 160+ cancer collections\n- **No sign-up or authentication required** — data is openly accessible\n- **Expert and AI-generated annotations** (e.g., organ segmentations) paired with images\n- **Standardized format** — all data uses DICOM, the medical imaging industry standard\n- **Cloud-native storage** — data lives in Google Cloud Storage (GCS) buckets, so downloads are fast\n- **Accompanying tools** - you can search, visualize, and subset the data\n\n## What is `idc-index`?\n\n[`idc-index`](https://github.com/ImagingDataCommons/idc-index) is a lightweight Python package that lets you search and download IDC data without any cloud account or special credentials. It ships with a local metadata index — a set of DuckDB tables describing every image series in IDC — so you can run SQL queries locally to find exactly the data you need before downloading anything.\n\n## What this tutorial covers\n\nThis tutorial shows how to:\n1. Query IDC metadata with SQL to find cancer imaging data\n2. Download DICOM images and segmentations with one function call\n3. Load the data into MONAI for AI/ML preprocessing\n4. Work with DICOM Segmentation (DICOM-SEG) objects and their rich metadata\n\n> **Tip**: This tutorial was created using [idc-claude-skill](https://github.com/ImagingDataCommons/idc-claude-skill) — an AI assistant skill for navigating IDC data and the `idc-index` API."
46
9
},
47
10
{
48
11
"cell_type": "markdown",
49
12
"metadata": {
50
13
"id": "iudwt5GoFCpC"
51
14
},
52
-
"source": [
53
-
"## Setup\n",
54
-
"\n",
55
-
"Install required packages:\n",
56
-
"- `monai` — Medical Open Network for AI, the ML framework used here\n",
57
-
"- `idc-index` — Query and download IDC data (includes local metadata index)\n",
58
-
"- `itk` / `itkwasm-dicom` — ITK-based DICOM readers used by MONAI's `ITKReader` and our custom DICOM-SEG loader"
59
-
]
15
+
"source": "## Setup environment\n\nInstall required packages:\n- `monai` — Medical Open Network for AI, the ML framework used here\n- `idc-index` — Query and download IDC data (includes local metadata index)\n- `itk` / `itkwasm-dicom` — ITK-based DICOM readers used by MONAI's `ITKReader` and our custom DICOM-SEG loader\n\n> **Colab users**: After running the cell below, restart the runtime before continuing (Runtime → Restart runtime). ITK requires a fresh runtime to load correctly after installation."
"print(f\"Unique labels: {torch.unique(seg_label)[:10].tolist()}...\") # First 10 labels"
682
-
]
480
+
"source": "class LoadDicomSegd(MapTransform):\n \"\"\"Load DICOM Segmentation (DICOM-SEG) files using ITKWasm.\n\n DICOM-SEG is an enhanced multiframe DICOM format that stores segmentation\n masks with segment metadata including recommended display colors.\n\n The affine matrix is derived directly from DICOM metadata (direction cosines,\n spacing, origin) with LPS→RAS conversion applied to match MONAI's ITKReader\n convention. No axis flipping is performed — orientation is fully encoded in\n the affine via the direction cosine matrix.\n \"\"\"\n\n def __init__(self, keys: KeysCollection, allow_missing_keys: bool = False):\n super().__init__(keys, allow_missing_keys)\n\n def _find_dcm_file(self, path: Path) -> Path:\n \"\"\"Find .dcm file in directory or return path if already a file.\"\"\"\n if path.is_file():\n return path\n dcm_files = list(path.glob(\"*.dcm\"))\n if not dcm_files:\n raise FileNotFoundError(f\"No .dcm files found in {path}\")\n return dcm_files[0]\n\n def _build_affine(self, spacing, origin, direction) -> np.ndarray:\n \"\"\"Build 4x4 affine matrix from DICOM spatial metadata.\n\n Converts from ITK/DICOM LPS convention to MONAI's RAS-like convention\n by negating X and Y world coordinates (LPS→RAS). No axis flips are\n applied — orientation is fully encoded in the affine via the direction\n cosine matrix.\n\n Args:\n spacing: Voxel spacing (X, Y, Z) as returned by itkwasm\n origin: Physical coordinates of voxel [0,0,0] in LPS\n direction: 3x3 direction cosine matrix D where D[i,j] is the\n component of voxel-axis-j's unit vector along LPS\n physical axis i. ITK affine formula:\n world_lps = D @ diag(spacing) @ voxel + origin\n \"\"\"\n lps_to_ras = np.diag([-1.0, -1.0, 1.0])\n affine = np.eye(4)\n affine[:3, :3] = lps_to_ras @ direction @ np.diag(spacing)\n affine[:3, 3] = lps_to_ras @ origin\n return affine\n\n def __call__(self, data: Mapping[Hashable, any]) -> dict[Hashable, any]:\n d = dict(data)\n for key in self.key_iterator(d):\n path = Path(d[key])\n dcm_file = self._find_dcm_file(path)\n\n # Read using ITKWasm\n seg_image, overlay_info = itkwasm_dicom.read_segmentation(dcm_file)\n\n # ITKWasm returns array in (Z, Y, X) order but metadata in (X, Y, Z) order.\n # Transpose to (X, Y, Z) to match metadata — this is a layout convention,\n # not an orientation flip.\n seg_array = np.asarray(seg_image.data).copy()\n seg_array = np.transpose(seg_array, (2, 1, 0))\n\n # Build affine from spatial metadata\n spacing = np.array(seg_image.spacing)\n origin = np.array(seg_image.origin)\n direction = np.array(seg_image.direction).reshape(3, 3)\n\n affine = self._build_affine(spacing, origin, direction)\n\n # Make contiguous (array may be non-contiguous after transpose)\n seg_array = np.ascontiguousarray(seg_array)\n\n # Create MONAI MetaTensor with metadata\n meta_tensor = MetaTensor(seg_array)\n meta_tensor.affine = affine\n meta_tensor.meta[\"filename_or_obj\"] = str(dcm_file)\n meta_tensor.meta[\"overlay_info\"] = overlay_info\n meta_tensor.meta[\"original_channel_dim\"] = \"no_channel\"\n\n d[key] = meta_tensor\n d[f\"{key}_meta_dict\"] = dict(meta_tensor.meta)\n\n return d\n\n\n# Load CT with MONAI's ITKReader\nct_transforms = Compose(\n [\n LoadImaged(keys=[\"image\"], reader=ITKReader()),\n EnsureChannelFirstd(keys=[\"image\"]),\n ]\n)\n\n# Load SEG with our custom LoadDicomSegd\nseg_transforms = Compose(\n [\n LoadDicomSegd(keys=[\"label\"]),\n EnsureChannelFirstd(keys=[\"label\"]),\n ]\n)\n\n# Load both\nimage_path = os.path.join(seg_dir, demo_pair[\"image_uid\"])\nseg_path = os.path.join(seg_dir, demo_pair[\"seg_uid\"])\n\nct_data = ct_transforms({\"image\": image_path})\nseg_data = seg_transforms({\"label\": seg_path})\n\nct_image = ct_data[\"image\"]\nseg_label = seg_data[\"label\"]\n\nprint(f\"CT image shape: {ct_image.shape}\")\nprint(f\"Segmentation shape: {seg_label.shape}\")\nprint(f\"Unique labels: {torch.unique(seg_label)[:10].tolist()}...\") # First 10 labels"
0 commit comments