IBIA: Indian Biological Images Archive

Image Data Submission Report

Generated on: 25 May 2026

Right Logo

Project Accession: IBIAP_1000000003
Title: An Opportunistic screening mammography dataset from a screening-naive population
Representative Image:
Description: Mammographic images dataset from Indian population containing 1869 FFDM images and 1708 SM images, providing breast-level imaging data (BIRADS category and breast density) along with ground truth labels based on histopathology for cancers and follow-up scans for noncancers.
Publications: N/A
Associated Codes (URL only): N/A
Funding agency: Ministry of Education, Government of India, India
Grant Number: T-316
Ethics Statement: Download
Any Other Information : N/A
Additional File: N/A
Acknowledgments: Dr. Radhika Rajeev provided assistance in the early stages of curating this mammography dataset.

Sr.No First name Last name Email Organization Designation
1 Amit Gupta amit.aiims2014@gmail.com Dr. B.R.A.IRCH, All India Institute of Medical Sciences (AIIMS), New Delhi, India Research Consultant
2 Kshitiz Jain kshitiz16051@iiitd.ac.in School of Artificial Intelligence, Indian Institute of Technology (IIT), Delhi, India Student
3 Mayank Bhardwaj buddybhardwaj@gmail.com Dr. B.R.A.IRCH, All India Institute of Medical Sciences (AIIMS), New Delhi, India Radiographer
4 Hema Malhotra hemamalhotraaiims.2020@gmail.com Dr. B.R.A.IRCH, All India Institute of Medical Sciences (AIIMS), New Delhi, India Program Manager
5 Sanjay Thulkar thulkar@hotmail.com Dr. B.R.A.IRCH, All India Institute of Medical Sciences (AIIMS), New Delhi, India Professor
6 Smriti Hari drsmritihari@gmail.com Dr. B.R.A.IRCH, All India Institute of Medical Sciences (AIIMS), New Delhi, India Professor
7 Chetan Arora chetan@cse.iitd.ac.in Department of Computer Science and Engineering, Indian Institute of Technology (IIT), Delhi, India Professor
8 Krithika Rangarajan krithikarangarajan@aiims.edu Dr. B.R.A.IRCH, All India Institute of Medical Sciences (AIIMS), New Delhi, India Principal Investigator

Study Accession: MAMOS_1000000004
Title: An Opportunistic screening mammography dataset from a screening-naive population
Imaging Type: Mammography (MAMO)
Imaging Sub-type: Diagnostic Radiology
Summary: Mammographic dataset from Indian population containing 1869 FFDM images and 1708 SM images and provides breast-level imaging data (BIRADS category and breast density) along with ground truth labels based on histopathology for cancers and follow-up scans for noncancers.
Keywords: Breast cancer; Mammography; Screening
Additional / Any Other Information: N/A
Release Date: Sept. 10, 2024
Access Licence Type: Open Access

Table 1. The sample types registered under this study are as follows:
Sample Type IDOrganismTaxon IDBiological EntityLateralitySource TissueSource Cell/Cell-lineCell Organelle
MAMOSMT_10000000007Homo sapiens 9606 BreastBothN/AN/AN/A

The total number of samples registered under this study is: 3577

Table 3. The experiment types registered under this study are as follows:
Experiment Type IDInstrument NameInstrument TypeManufacturerModel
MAMOET_10000000003Mammography MachineDigital mammographyHologicSelenia


Experimental Design Summary (MAMOET_10000000003)
The mammograms included in the dataset were performed on the Hologic Selenia Dimensions system at our institute. At the time of acquisition of images present in this dataset, the standard mammographic examination for each breast comprised of two 2D views - cranio-caudal (CC) and medio-lateral oblique (MLO) along with DBT and reconstructed SM images, all acquired in a single compression (acquired in combo mode). In case of any suspicious findings on mammogram or in view of reduced sensitivity of mammography for dense breasts, a breast ultrasound (US) correlation was performed for the patient in the same sitting and the radiological report was based on findings seen on both mammogram and US in such cases. Mammography reports were generated strictly adhering to the fifth edition of BIRADS and a single BIRADS category is assigned for each breast based on the most suspicious finding on either mammogram or US. Standard double-reading assessment was followed for mammography reports, with the first reading by a resident radiologist (year 3-6 of training) and second reading by a specialist breast radiologist (with more than 10 years’ of experience). The second reader had access to the opinion of the first reader and the opinion of the second reader was considered in case of discrepancy.

Acquired Images Annotation Description (MAMOET_10000000003)
A composite gold standard was used. In general, the ground truth varied depending on the BIRADS category assigned to a patient. Ground truth for BIRADS 4 and 5 lesions, was based on histopathologic findings, typically from image-guided biopsy. The pathology reports were extracted manually from the electronic medical record of the hospital and report results were divided into ‘cancer’ and ‘non-cancer’. The standard protocol followed for patients with BIRADS 3 lesions is 6 monthly follow up mammograms for 2 years to demonstrate stability; and patients with BIRADS 1 and 2 assessment are referred back to their primary speciality for further advice. Since we do not have a standard screening program in the country, no routine screening is typically advised, though patients considered at high risk (ovarian cancer/ previously treated breast cancer) are advised annual or biennial screening mammograms by the referring physician. Thus, wherever available, the follow-up mammograms were seen for patients with BIRADS 1-3 assessment categories. In case of growth of lesion (in BIRADS 3 category) or appearance of new lesion (to suggest possibility of interval cancer), the histopathology results were sought for these patients. If no such finding was seen these patients were considered ‘non-cancer’. A significant number of our patients however in all 3 categories (BIRADS 1-3) do not report back for follow up mammograms. Such patients were not excluded from the dataset and were categorized as ‘non-cancer’, though a note was made of absence of follow up study. The ground truth labels thus generated were stored along with the unique patient identifiers in a csv file, available along with the dataset.

The total number of experiments registered under this study is: 3577

The total number of images registered under this study is: 3577