Image Data Submission Report
Generated on: 26 May 2026
| Project Accession: | IBIAP_1000000030 |
| Title: | IPD-Brain: An Indian histopathology dataset for glioma subtype classification |
| Representative Image: | |
| Description: | The effective management of brain tumors relies on precise typing, subtyping, and grading. We present the IPD-Brain Dataset, a crucial resource for the neuropathological community, comprising 547 high-resolution H&E stained slides from 367 patients for the study of glioma subtypes and immunohistochemical biomarkers. Scanned at 40x magnification, this dataset is one of the largest in Asia, specifically focusing on the Indian demographics. It encompasses detailed clinical annotations, including patient age, sex, radiological findings, diagnosis, CNS WHO grade, and IHC biomarker status (IDH1R132H, ATRX and TP53 along with proliferation index, Ki67), providing a rich foundation for research. The dataset is open for public access and is designed for various applications, from machine learning model training to the exploration of regional and ethnic disease variations. Preliminary validations utilizing Multiple Instance Learning for tasks such as glioma subtype classification and IHC biomarker identification underscore its potential to significantly contribute to global collaboration in brain tumor research, enhancing diagnostic precision and understanding of glioma variability across different populations. |
| Publications: | https://doi.org/10.1038/s41597-024-04225-9 |
| Associated Codes (URL only): | N/A |
| Funding agency: | N/A |
| Grant Number: | N/A |
| Ethics Statement: | N/A |
| Any Other Information : | The original version of the dataset is available from "India Data" at https://india-data.org/dataset-details/170acc68-1288-499e-9a91-b951e569e70d |
| Additional File: | N/A |
| Acknowledgments: | We acknowledge IHub-Data, IIIT Hyderabad (H1-002), for financial assistance. We also thank Ms. Ramya Alugam and Mr. Akula Rajesh Goud for data digitalization and organization. |
| Sr.No | First name | Last name | Organization | Designation | |
|---|---|---|---|---|---|
| 1 | Ekansh | Chauhan | ekansh.chauhan@research.iiit.ac.in | Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, 500032, India | Principal Investigator |
| 2 | Amit | Sharma | N/A | Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, 500032, India | Unspecified |
| 3 | Megha | Uppin | N/A | Department of Pathology, Nizam’s Institute Of Medical Sciences, Hyderabad, 500082, India | Unspecified |
| 4 | Manasa | Kondamadugu | N/A | IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India | Principal Investigator |
| 5 | C | Jawahar | jawahar@iiit.ac.in | Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, 500032, India | Unspecified |
| 6 | P | Vinod | vinod.pk@iiit.ac.in | Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India | Principal Investigator |
| Study Accession: | HISTOS_1000000035 |
| Title: | IPD-Brain, a histopathology dataset for precise typing, subtyping, and grading of glioma. |
| Imaging Type: | Histopathology (HISTO) |
| Imaging Sub-type: | Diagnostic Pathology |
| Summary: | Glial tumors, particularly astrocytoma, glioblastoma, and oligodendroglioma, are the most prevalent, with glioblastoma being the most aggressive, comprising about 57% of all gliomas. Despite aggressive treatments, median survival is only 12–15 months, primarily due to its high recurrence and infiltrative nature. Astrocytoma and Oligodendroglioma are characterized by IDH and ATRX mutations and 1p19q co-deletion, respectively. Techniques like Hematoxylin and Eosin (H&E) staining and Immunohistochemistry (IHC) are vital for tissue analysis and cancer diagnosis. H&E staining highlights cellular and tissue structures, aiding in tumor subtype identification and grading, while IHC staining, which uses tagged antibodies and detects specific proteins, plays a crucial role in identifying molecular biomarkers. Integrating deep learning with histopathology offers the potential to predict various biomarkers using H&E stained whole slide images (WSIs). This approach seeks to reduce intra- and inter-observer variability among pathologists and bridge the gap between tumor morphology and genetics, thereby assisting in diagnosis. In recent years, the field of histopathology has undergone a significant transformation, primarily due to the emergence of digital histopathology. The available datasets like The Cancer Genome Atlas (TCGA), IvyGAP, Digital Brain Tumour Atlas (EBRAINS) or TCIA primarily consist of digitized fresh frozen tissue sections. These types of preparations often exhibit relatively poor tissue morphology compared to formalin-fixed and paraffin-embedded (FFPE) tissues, thus restricting the application of deep learning methods for analysis. The evolution of brain tumor research underscores the importance of incorporating broader demographic datasets, encompassing diverse populations beyond conventional cohorts. This comprehensive understanding, facilitated by datasets from various demographics, helps capture the diversity of cellular and morphological characteristics necessary for further developing deep learning algorithms and extending their applicability to the entire spectrum of brain tumor types. |
| Keywords: | Glioma; Astrocytoma; Glioblastoma; Oligodendroglioma; Hematoxylin and Eosin (H&E); Immunohistochemistry (IHC); Whole Slide Images (WSIs) |
| Additional / Any Other Information: | N/A |
| Release Date: | Jan. 27, 2026 |
| Access Licence Type: | Open Access |
| Sample Type ID | Organism | Taxon ID | Biological Entity | Laterality | Source Tissue | Source Cell/Cell-line | Cell Organelle |
|---|---|---|---|---|---|---|---|
| HISTOSMT_10000000066 | Homo sapiens | 9605 | Brain | Not Applicable | N/A | N/A | N/A |
| Experiment Type ID | Instrument Name | Instrument Type | Manufacturer | Model |
|---|---|---|---|---|
| HISTOET_10000000033 | Digital Slide Scanning System | Digital Slide Scanner | Morphle Labs | DigiPath 6 T |
| Experimental Design Summary (HISTOET_10000000033) |
|---|
| Glial tumors, particularly astrocytoma, glioblastoma, and oligodendroglioma, are the most prevalent, with glioblastoma being the most aggressive, comprising about 57% of all gliomas. Despite aggressive treatments, median survival is only 12–15 months, primarily due to its high recurrence and infiltrative nature. Astrocytoma and Oligodendroglioma are characterized by IDH and ATRX mutations and 1p19q co-deletion, respectively. Techniques like Hematoxylin and Eosin (H&E) staining and Immunohistochemistry (IHC) are vital for tissue analysis and cancer diagnosis. H&E staining highlights cellular and tissue structures, aiding in tumor subtype identification and grading, while IHC staining, which uses tagged antibodies and detects specific proteins, plays a crucial role in identifying molecular biomarkers. Integrating deep learning with histopathology offers the potential to predict various biomarkers using H&E stained whole slide images (WSIs). This approach seeks to reduce intra- and inter-observer variability among pathologists and bridge the gap between tumor morphology and genetics, thereby assisting in diagnosis. In recent years, the field of histopathology has undergone a significant transformation, primarily due to the emergence of digital histopathology. The available datasets like The Cancer Genome Atlas (TCGA), IvyGAP, Digital Brain Tumour Atlas (EBRAINS) or TCIA primarily consist of digitized fresh frozen tissue sections. These types of preparations often exhibit relatively poor tissue morphology compared to formalin-fixed and paraffin-embedded (FFPE) tissues, thus restricting the application of deep learning methods for analysis. The evolution of brain tumor research underscores the importance of incorporating broader demographic datasets, encompassing diverse populations beyond conventional cohorts. This comprehensive understanding, facilitated by datasets from various demographics, helps capture the diversity of cellular and morphological characteristics necessary for further developing deep learning algorithms and extending their applicability to the entire spectrum of brain tumor types. |
| Acquired Images Annotation Description (HISTOET_10000000033) |
|---|
| The process begins by segmenting tissue regions within digitized slides. After smoothing, the image is converted to the HSV color space, and a mask is created by thresholding the saturation channel. Morphological operations fill gaps and holes, and contours are filtered based on area. We have included optimal segmentation parameters in a.csv file in the dataset. Following segmentation, the algorithm crops 256 × 256 patches, from within the segmented WSI regions based on user-specified magnification without overlap. Quality control was applied multiple times at different stages. Before digitization, each slide was evaluated using a light microscope to verify staining quality, ensuring digital scans accurately represent the histopathological features necessary for precise analysis. Subsequent scans were reviewed, and suboptimal scans were rescanned, if possible, or excluded. We purposefully include a few WSIs with artifacts of real-world slide imperfections. Clinical annotations, including age, sex, diagnosis, radiology findings, clinical features (C/F), grade, immunohistochemistry (IHC) biomarker statuses (IDH1R132H, ATRX, TP53), and the Ki67 index, were retrieved directly from the patient’s records to ensure authenticity and accuracy. To validate the reliability of these annotations, a subset of samples was randomly selected for independent review. This cross-checking process involved confirming the original annotations against the expert diagnosis (M.S.U.), affirming the correctness of labels and diagnoses provided. Such validation of annotations reinforces the dataset’s value as a trustworthy resource for training diagnostic models. |