Retrieval2 Benchmark

The Retrieval2 Benchmark is a continuously running benchmark for similar case retrieval in the medical domain based on 3D images and text information. Cases refer to data about specific patients (used in an anonymised form), such as medical records, radiology images and radiology reports or cases described in the literature or teaching files.

VISCERAL Retrieval2 dataset

The Retrieval2 dataset consists 2311 volumes originated from various modalities (CT, MRT1, MRT2). These scans have been acquired during the daily clinical routine work from three different data providers.  For a subset of these volumes we provide from the volume’s radiological report extracted anatomy-pathology terms in the form of csv files. The following table gives an overview of the dataset in which a participant should perform the retrieval task.

Modality Body Region Volumes Available A-P term files
CT Abdomen 336 213
CT Thorax 971 699
CT Thorax + Abdomen 86 86
CT Unknown 211 211
CT Whole body 410 410
MRT1 Abdomen 167 114
MRT1 Unknown 24 24
MRT2 Abdomen 68 18
MRT2 Unkwnown 38 38
TOTAL   2311 1813

The anatomy-pathology term files list pathological terms that occur in the report of a volume together with its anatomy. Both entities are described textually and additionally with their corresponding Radlex ID (RID). Radlex is a unified language of radiology terms that can be used for standardized indexing and retrieval of radiology information resources. Each term file lists both, occurring and explicitly in the report negated pathologies. 

Content-based medical image retrieval

It serves the following scenario: a user is assessing a query case in a clinical setting, e.g., a CT volume, and is searching for cases that are relevant in this assessment. The algorithm has to find cases that are relevant in a large database of cases. For each topic (query case) there is:

  • the patient 3D imaging data (CT, MRI)
  • 3D bounding box region of interest containing the radiological signs of the pathology
  • binary mask of the main organ affected
  • radiological report extracted anatomy-pathology terms in form of csv files.

Participants have to develop an algorithm that finds clinically-relevant (related or useful for differential diagnosis) cases given a query case (imaging data only or imaging and text data), but not necessarily the final diagnosis.

Medical experts have performed relevance assessment to judge the quality of retrieval. Evaluation metrics used are:

  • mean average precision (MAP);
  • geometric mean average precision (GM-MAP);
  • binary preference (bpref);
  • precision after 10 cases retrieved (P10);
  • precision after 30 cases retrieved (P30).

The relevance assessments for the queries are available, and metrics should be calculated using trec_eval.


Register for a benchmark account at the VISCERAL registration website.  Choose "Retrieval2" as the Benchmark for which to register (Virtual Machines will not be used in this Benchmark, so the VM selection can be left as is). As a next step, a participation agreement must be signed and uploaded. Once this is done and accepted by the organisers, access will be granted to the participant dashboard.

The participants have access through the participant dashboard to instructions for downloading the data via ftp.

The Guidelines for Participation are available (v1.0 of 20150619).