IBIA: Indian Biological Images Archive

Image Data Submission Report

Generated on: 26 May 2026

Right Logo

Project Accession: IBIAP_1000000030
Title: IPD-Brain: An Indian histopathology dataset for glioma subtype classification
Representative Image:
Description: The effective management of brain tumors relies on precise typing, subtyping, and grading. We present the IPD-Brain Dataset, a crucial resource for the neuropathological community, comprising 547 high-resolution H&E stained slides from 367 patients for the study of glioma subtypes and immunohistochemical biomarkers. Scanned at 40x magnification, this dataset is one of the largest in Asia, specifically focusing on the Indian demographics. It encompasses detailed clinical annotations, including patient age, sex, radiological findings, diagnosis, CNS WHO grade, and IHC biomarker status (IDH1R132H, ATRX and TP53 along with proliferation index, Ki67), providing a rich foundation for research. The dataset is open for public access and is designed for various applications, from machine learning model training to the exploration of regional and ethnic disease variations. Preliminary validations utilizing Multiple Instance Learning for tasks such as glioma subtype classification and IHC biomarker identification underscore its potential to significantly contribute to global collaboration in brain tumor research, enhancing diagnostic precision and understanding of glioma variability across different populations.
Publications: https://doi.org/10.1038/s41597-024-04225-9
Associated Codes (URL only): N/A
Funding agency: N/A
Grant Number: N/A
Ethics Statement: N/A
Any Other Information : The original version of the dataset is available from "India Data" at https://india-data.org/dataset-details/170acc68-1288-499e-9a91-b951e569e70d
Additional File: N/A
Acknowledgments: We acknowledge IHub-Data, IIIT Hyderabad (H1-002), for financial assistance. We also thank Ms. Ramya Alugam and Mr. Akula Rajesh Goud for data digitalization and organization.

Sr.No First name Last name Email Organization Designation
1 Ekansh Chauhan ekansh.chauhan@research.iiit.ac.in Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, 500032, India Principal Investigator
2 Amit Sharma N/A Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, 500032, India Unspecified
3 Megha Uppin N/A Department of Pathology, Nizam’s Institute Of Medical Sciences, Hyderabad, 500082, India Unspecified
4 Manasa Kondamadugu N/A IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India Principal Investigator
5 C Jawahar jawahar@iiit.ac.in Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, 500032, India Unspecified
6 P Vinod vinod.pk@iiit.ac.in Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India Principal Investigator

Study Accession: HISTOS_1000000035
Title: IPD-Brain, a histopathology dataset for precise typing, subtyping, and grading of glioma.
Imaging Type: Histopathology (HISTO)
Imaging Sub-type: Diagnostic Pathology
Summary: Glial tumors, particularly astrocytoma, glioblastoma, and oligodendroglioma, are the most prevalent, with glioblastoma being the most aggressive, comprising about 57% of all gliomas. Despite aggressive treatments, median survival is only 12–15 months, primarily due to its high recurrence and infiltrative nature. Astrocytoma and Oligodendroglioma are characterized by IDH and ATRX mutations and 1p19q co-deletion, respectively. Techniques like Hematoxylin and Eosin (H&E) staining and Immunohistochemistry (IHC) are vital for tissue analysis and cancer diagnosis. H&E staining highlights cellular and tissue structures, aiding in tumor subtype identification and grading, while IHC staining, which uses tagged antibodies and detects specific proteins, plays a crucial role in identifying molecular biomarkers. Integrating deep learning with histopathology offers the potential to predict various biomarkers using H&E stained whole slide images (WSIs). This approach seeks to reduce intra- and inter-observer variability among pathologists and bridge the gap between tumor morphology and genetics, thereby assisting in diagnosis. In recent years, the field of histopathology has undergone a significant transformation, primarily due to the emergence of digital histopathology. The available datasets like The Cancer Genome Atlas (TCGA), IvyGAP, Digital Brain Tumour Atlas (EBRAINS) or TCIA primarily consist of digitized fresh frozen tissue sections. These types of preparations often exhibit relatively poor tissue morphology compared to formalin-fixed and paraffin-embedded (FFPE) tissues, thus restricting the application of deep learning methods for analysis. The evolution of brain tumor research underscores the importance of incorporating broader demographic datasets, encompassing diverse populations beyond conventional cohorts. This comprehensive understanding, facilitated by datasets from various demographics, helps capture the diversity of cellular and morphological characteristics necessary for further developing deep learning algorithms and extending their applicability to the entire spectrum of brain tumor types.
Keywords: Glioma; Astrocytoma; Glioblastoma; Oligodendroglioma; Hematoxylin and Eosin (H&E); Immunohistochemistry (IHC); Whole Slide Images (WSIs)
Additional / Any Other Information: N/A
Release Date: Jan. 27, 2026
Access Licence Type: Open Access

Table 1. The sample types registered under this study are as follows:
Sample Type IDOrganismTaxon IDBiological EntityLateralitySource TissueSource Cell/Cell-lineCell Organelle
HISTOSMT_10000000066Homo sapiens 9605 BrainNot ApplicableN/AN/AN/A

The total number of samples registered under this study is: 547

Table 3. The experiment types registered under this study are as follows:
Experiment Type IDInstrument NameInstrument TypeManufacturerModel
HISTOET_10000000033Digital Slide Scanning SystemDigital Slide ScannerMorphle LabsDigiPath 6 T


Experimental Design Summary (HISTOET_10000000033)
Glial tumors, particularly astrocytoma, glioblastoma, and oligodendroglioma, are the most prevalent, with glioblastoma being the most aggressive, comprising about 57% of all gliomas. Despite aggressive treatments, median survival is only 12–15 months, primarily due to its high recurrence and infiltrative nature. Astrocytoma and Oligodendroglioma are characterized by IDH and ATRX mutations and 1p19q co-deletion, respectively. Techniques like Hematoxylin and Eosin (H&E) staining and Immunohistochemistry (IHC) are vital for tissue analysis and cancer diagnosis. H&E staining highlights cellular and tissue structures, aiding in tumor subtype identification and grading, while IHC staining, which uses tagged antibodies and detects specific proteins, plays a crucial role in identifying molecular biomarkers. Integrating deep learning with histopathology offers the potential to predict various biomarkers using H&E stained whole slide images (WSIs). This approach seeks to reduce intra- and inter-observer variability among pathologists and bridge the gap between tumor morphology and genetics, thereby assisting in diagnosis. In recent years, the field of histopathology has undergone a significant transformation, primarily due to the emergence of digital histopathology. The available datasets like The Cancer Genome Atlas (TCGA), IvyGAP, Digital Brain Tumour Atlas (EBRAINS) or TCIA primarily consist of digitized fresh frozen tissue sections. These types of preparations often exhibit relatively poor tissue morphology compared to formalin-fixed and paraffin-embedded (FFPE) tissues, thus restricting the application of deep learning methods for analysis. The evolution of brain tumor research underscores the importance of incorporating broader demographic datasets, encompassing diverse populations beyond conventional cohorts. This comprehensive understanding, facilitated by datasets from various demographics, helps capture the diversity of cellular and morphological characteristics necessary for further developing deep learning algorithms and extending their applicability to the entire spectrum of brain tumor types.

Acquired Images Annotation Description (HISTOET_10000000033)
The process begins by segmenting tissue regions within digitized slides. After smoothing, the image is converted to the HSV color space, and a mask is created by thresholding the saturation channel. Morphological operations fill gaps and holes, and contours are filtered based on area. We have included optimal segmentation parameters in a.csv file in the dataset. Following segmentation, the algorithm crops 256 × 256 patches, from within the segmented WSI regions based on user-specified magnification without overlap. Quality control was applied multiple times at different stages. Before digitization, each slide was evaluated using a light microscope to verify staining quality, ensuring digital scans accurately represent the histopathological features necessary for precise analysis. Subsequent scans were reviewed, and suboptimal scans were rescanned, if possible, or excluded. We purposefully include a few WSIs with artifacts of real-world slide imperfections. Clinical annotations, including age, sex, diagnosis, radiology findings, clinical features (C/F), grade, immunohistochemistry (IHC) biomarker statuses (IDH1R132H, ATRX, TP53), and the Ki67 index, were retrieved directly from the patient’s records to ensure authenticity and accuracy. To validate the reliability of these annotations, a subset of samples was randomly selected for independent review. This cross-checking process involved confirming the original annotations against the expert diagnosis (M.S.U.), affirming the correctness of labels and diagnoses provided. Such validation of annotations reinforces the dataset’s value as a trustworthy resource for training diagnostic models.

The total number of experiments registered under this study is: 547

The total number of images registered under this study is: 547