ISBI BodyMaps24: 3D Atlas of Human Body

Variations in organ sizes and shapes can indicate a range of medical conditions, from benign anomalies to life-threatening diseases. Precise organ volume measurement is fundamental for effective patient care, but manual organ contouring is extremely time-consuming and exhibits considerable variability among expert radiologists. Artificial Intelligence (AI) holds the promise of improving volume measurement accuracy and reducing manual contouring efforts. We formulate our challenge as a semantic segmentation task, which automatically identifies and delineates the boundary of various anatomical structures essential for numerous downstream applications such as disease diagnosis and treatment planning. Our primary goal is to promote the development of advanced AI algorithms and to benchmark the state of the art in this field..


The BodyMaps challenge particularly focuses on assessing and improving the generalizability and efficiency of AI algorithms in medical segmentation across diverse clinical settings and patient demographics. In light of this, the innovation of our BodyMaps challenge includes the use of (1) large-scale, diverse datasets for training and evaluating AI algorithms, (2) novel evaluation metrics that emphasize the accuracy of hard-to-segment anatomical structures, and (3) penalties for algorithms with extended inference times. Specifically, this challenge involves two unique datasets. First, AbdomenAtlas, the largest annotated dataset [Qu et al., 2023, Li et al., 2023], contains a total of 10,142 three-dimensional computed tomography (CT) volumes. In each CT volume, 25 anatomical structures are annotated by voxel. AbdomenAtlas is a multi-domain dataset of pre, portal, arterial, and delayed phase CT volumes collected from 88 global hospitals in 9 countries, diversified in age, pathological conditions, body parts, and race background. The AbdomenAtlas dataset will be released by stages to the public for AI development, where in each stage we will release 1,000 annotated CT volumes. Second, JHH-1K [Park et al., 2020] is a proprietary collection of 1,150 dual-phase CT volumes from Johns Hopkins Hospital (JHH), where 22 anatomical structures are annotated by voxel. CT volumes and annotations of JHH-1K will not be disclosed to the public and are exclusively reserved for external validation of AI algorithms. The final scoring will not only be limited to the average segmentation performance but also prioritize the performance of hard-to-segment structures and consider the inference speed of the algorithm. We hope our BodyMaps challenge can set the stage for larger-scale clinical trials and offer exceptional opportunities to practitioners in the medical imaging community.

Timeline All dates 2024.

Jan 10 Challenge website running and registration open
Jan 16 Release of the dataset and starter code
April 15 Submission deadline
April 20 Release of final results (decisions)
May 27 - May 30 Challenge days (ISBI main conference)

How to Participate

+ Join the Challenge

The challenge submission is based on Docker container. So, participants should demonstrate basic segmentation skills and the ability to encapsulate their methods in Docker. We provide a playground for participants to practice. Participants should

  • Develop any segmentation method (e.g., U-Net) based on the playground training dast and encapsulate the method by Docker.
  • Use Docker to predict the testing set and record 5-10 minutes of the predicting process as a video (mp4 format).
  • Submit the segmentation results here and upload your Docker to DockerHub.
    (1) docker hub link;
    (2) download link to the recorded inference mp4 video;
    (3) the screenshot of your playground leaderboard results (Mean DSC>0.8).
After reviewing your submission, we will get back to you with an Entry Number, then you can join the Challenge. We also provide a step-by-step tutorial if you are not familiar with 3D image segmentation.
+ Download the data

There are three datasets used in our challenge. First, the training dataset, ImageNetCT-9K, is provided on Google Drive where you can download the ground-truth masks and follow the corresponding instructions to download the CT scans. Note that you can use external datasets to train your model for better performance. Second, the validation dataset, TotalSegmentor, is a public dataset that is used in an instant and continuously opened submission-and-validation leaderboard to evaluate your model repetitively. Last, the private testing dataset, JHH-1K, reflects a real-world, diverse patient population that encompasses a broad spectrum of pathological conditions, age groups, and demographic backgrounds.

After you finish the registration in the first step, the download link to ImageNetCT-9K will be sent to you along with the Entry Number. As for the validation dataset, please follow the instructions to download TotalSegmentor.

+ Create your model

The task is to develop a model that can predict high-quality segmentations for abdominal organs. The training data consists of several thousands of examples on which models can be trained and validated.

Teams are allowed to use other data in addition to the official training set in order to construct their models, however, that data must be publicly available as of 1/16/2024. The same applies to pre-trained weights -- they must have been publicly available before the ImageNetCT-9K dataset was released on 1/16. This is to prevent unfair advantages for teams that may have amassed large private datasets. All external data use must be described in detail in each team's accompanying paper (described in the following section).

Based on the performance we achieved by directly training the model on ImageNetCT-9K and evaluated on TotalSegmentor (without post-processing), your model’s performance should be higher than the results in the following table:

The table will be available soon.

Wondering where to start? Some useful tutorials and previous methods are given below:

Strategies to improve the segmentation performance:
  • Preprocessing: intensity normalization, resampling...
  • Extensive data augmentations;
  • Coarse-to-fine (two-stage or cascaded) framework;
  • Postprocessing: connected component analysis;
Strategies to improve the computational efficiency:
  • Whole-volume based input;
  • Lightweight network modules: residual block with bottleneck, separated convolutions, and pyramid pooling;
  • Accelerate inference by ONNX Runtime, TensorRT...
Training Details and Techniques
  • Using a Larger Patch Size
  • Utilizing a Pre-trained Model based on the Totalsegmentor Dataset
  • Incorporating Other Publicly Available Data for Training the Same Task
+ Describe your approach with a short paper

​​The primary goal of challenges like KiTS is to objectively assess the performance of competing methods. This is only possible if teams provide a complete description of the methods they use. Teams should follow the provided template [Overleaf, Google Docs] and provide satisfactory answers to every field. Papers should otherwise follow the ISBI main conference guidelines for paper formatting. Drafts of these papers must be submitted by 04/15/2024.

+ Submit your predictions (validation leaderboard)

Due to a short period of development time, we accept docker testing submissions on JHH-1k and segmentation results validation submissions on TotalSegmentator at the beginning. The results of the TotalSegmentator will be presented on the Results page and the ones of JHH-1K will be evaluated locally by the docker container and presented on the Testing Results page.

All teams can directly submit the segmentation results on the challenge page to get the segmentation accuracy metrics. Please compress your results by running "zip XXX.zip ./*.nii.gz" and upload the zip file.

+ Submit your algorithm container (testing leaderboard)

The submission should include: (Email subject: YourTeamName-TeamLeaderName-Testing Submission)

(1) a download link to your Docker container (teamname.tar.gz); If the Docker container does not work, we will return back the error information to the participants. Participants with technical failure are allowed to resubmit their algorithms with one extra time. When the evaluation is finished, we will return back the evaluation metrics via email. All valid submission results will be reported on the leaderboard.

(2) a sanity test video record (download example: Google drive, Baidu Netdisk) Please test your docker on validation case: XXX.nii.gz, XXX.nii.gz, XXX.nii.gz and record the prediction process. For each case, the running time should be within 60s.

(3) a methodology paper (template) Please carefully read the template and the common issue before writing the manuscript. The evaluation process mainly focuses on the paper's completeness. Don't worry about the low wDSC/wNSD. Since this segmentation task is very challenging, all attempts are worth sharing with readers. We will not reject papers because of low wDSC/wNSD.

The submitted Docker container will be evaluated with the following commands. If the Docker container does not work or the paper does not include all the necessary information to reproduce the method, we will return back the error information and review comments to participants.

docker load -i teamname.tar.gz
docker container run --gpus "device=1" -m 28G --name teamname --rm -v $PWD/IMseg2024_Test/:/workspace/inputs/ -v $PWD/teamname_outputs/:/workspace/outputs/ teamname:latest /bin/bash -c "sh predict.sh"

Award

We will provide cash prizes for the top-5 teams (Amazon gift cards with 500/300/200/100/100 CAD, respectively). A certificate will be awarded to the top 10 teams. The top 10 performing methods (teams) will be announced publicly and invited to give oral presentations during the ISBI 2024 conference. All participating teams have the opportunity to publish their results on the ISBI 2024 and other vision conference proceedings.

Evaluation


Evaluation Metrics

The segmentation accuracy metric:

Weighted Dice Similarity Coefficient (wDSC). This metric evaluates the overlap between algorithm output and ground truth, with a weighting factor that reflects the segmentation difficulty for each structure. The weight for each structure’s DSC is estimated based on the per-class segmentation performance reported in existing literature (e.g., [Liu et al., ICCV 2023]). Some structures are inherently more difficult to segment than others due to blurry boundaries, small in size, and tubular structures. Our weighted metric is novel compared to the common practice in segmentation challenges, where only the average DSC is calculated uniformly across all classes.

Weighted Normalized Surface Distance (wNSD): The wNSD emphasizes the accuracy of the boundary delineation between the predicted segmentation and the ground truth. This is particularly important for precise organ volume measurement and subsequent surgical planning.

The segmentation efficiency metric:

Running Time: A maximum inference time of 60 seconds is allowed for each case. Note that we ignore the docker starting time for an accurate measurement. Cases exceeding this limit will be deemed failures, with their DSC and NSD scores set to zero.

Area Under GPU Memory-Time Curve (MB) [Ma et al. FLARE 2023]: The algorithm's memory efficiency is measured over time, considering the computational resources it uses, as reflected in the GPU memory-time curve. We offer a tolerance of 16GB for GPU memory consumption, which aligns with the affordability and availability of such GPUs in most medical centers.


Evaluation Platform

The submitted docker containers will be evaluated on a Ubuntu 18.04 server. Detailed information is listed as follows:

CPU: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz x 16
GPU: NVIDIA TITAN RTX (24G)
RAM: 252G
Driver Version: 525.116.04
CUDA Version: 12.0
Docker version 20.10.21

Terms and Conditions

  • All participants should register for this challenge with their real names, affiliations (including department, full name of university/institute/company, country), and affiliation E-mails.
  • Incomplete and redundant registrations will be removed without notice. Each team can have at most ten people.
  • Participants are not allowed to register multiple teams and accounts. Participants from the same research group are also not allowed to register multiple teams. IMSeg Organizers keep the right to disqualify such participants.
  • All participants must submit a complete solution to this challenge for testing. A complete solution includes a Docker container (tar file) and a qualified methodology paper (at least 2 pages, LNCS format).
  • All participants should agree that the submitted short papers can be publicly available to the community on the challenge website, and organizers can use the information provided by the participants, including scores, predicted labels, and papers.

Organizers


Lead Organizers:

Zongwei Zhou (Johns Hopkins University)
Wenxuan Li (Johns Hopkins University)
Yu-Cheng Chou (Johns Hopkins University)
Jieneng Chen (Johns Hopkins University)
Alan Yuille (Johns Hopkins University)

Coordinators:

Yixiong Chen (Johns Hopkins University)
Angtian Wang (Johns Hopkins University)
Yaoyao Liu (Johns Hopkins University)
Qi Chen (University of Science and Technology of China)
Xiaoxi Chen (Shanghai Jiao Tong University)
Yuxiang Lai (Southeast University)

Datasets

The data download link to ImageNetCT-9K will be sent to approved teams via email. Please make sure that you can download large files (XXG CT Scans) from Google Drive or Baidu Netdisk and have enough space and computing resources to process them.

Additional data and pre-trained models are allowed!


Dataset Description

The challenge data is acquired from patients represented in the ImageNetCT-9K [Qu et al., 2023, Li et al., 2023] and JHH-1K [Park et al., 2020] datasets, encompassing a broad spectrum of pathological conditions, age groups, and demographic backgrounds. This ensures that the challenge reflects a real-world, diverse patient population. Detailed statistics can be found in the corresponding publications.

For ImageNetCT-9K, we will provide 296K masks and 3.7M annotated images that are taken from 68 hospitals worldwide, spanning four distinct phases: pre, portal, arterial, and delayed.

For JHH-1K, a total of 1,150 dual-phase contrast-enhanced CT volumes from 575 subjects were acquired from 2005 to 2009. There were 229 men and 346 women (mean age: 45 ± 12 years; range: 18—79 years).

Q&A


Registration Related issues:

Q: How long can the participation request be approved after sending the signed challenge rule?

A: The request will be approved within 2-4 working days if the signed challenge rule document is filled out correctly.

Q: I'm only interested in the challenge dataset but I do not want to join the challenge. Can I download the dataset without joining the challenge?

A: Thanks for your interest. To ensure enough submissions, the dataset is only available to participants during the challenge.

Q: How many people can form a team?

A: Each team can have at most 10 people. The authors in your paper should be the same as the team member list.

Q: I have joined the challenge and downloaded the dataset. Can I quit the challenge?

A: No! Please respect the signed agreement. If registered participants do not make successful submissions, all the team members will be listed in the dishonest list.


Algorithm-related issues:

Q: Can we use other datasets or pre-trained models to develop the segmentation algorithms?

A: Yes.

Q: Does the validation submission affect the final ranking?

A: No.

Q: During the testing phase, can I modify the methods and the paper?

A: Yes, you can make modifications before the testing submission. After making the testing submission, you cannot make modifications.