Exoplanet Imaging Data Challenge (source detection)

Community-wide data challenge for high-contrast imaging data processing. Its main goal is the fair comparison of image processing algorithms for exoplanet direct detection


Welcome to The Exoplanet Imaging Data Challenge, a community-wide effort to compare algorithms for detecting exoplanets with ground-based high-contrast imaging. The challenge is organized in collaboration with several members of the high-contrast imaging community and has direct support from the Grenoble Alpes Data Institute (Université Grenoble Alpes, France).

In the spirit of openness, anyone can contribute at any stage of the challenge by sending a pull request to this repository. If you have questions, suggestions or issues when submitting, please open and issue on this repository or contact any of the organizers (see the Core team section). All the benchmark datasets, metrics and results will be publicly available at the end of the competition.

Scientific context and objectives

Direct imaging is the next big step in the hunt of extrasolar planets. But observing exoplanets using ground-based telescopes is a very challenging task! The main difficulties are the huge difference in brightness between the host star and its potential companions, the small angular separation between them, and the image degradation caused by the Earth’s turbulent atmosphere. Therefore, ground-based high-contrast imaging (HCI) relies on the use of adaptive optics for wavefront correction and coronagraphy for the suppression of light coming from the star. The following video, prepared by the NASA Exoplanet Exploration Program, explains in simple terms the role of coronagraphy and adaptive optics with deformable mirrors in HCI. Please check it out:

The two remaining components of high-contrast imaging are the data acquisition techniques which focus on introducing some diversity in the data that later on can be exploited by the post-processing techniques (see the sub-challenges section below). This last step of algorithmic speckle noise subtraction is what ultimately pushes the sensitivity of the exposures, and the detection limits of HCI instruments and surveys.

Objectives: A multitude of image processing algorithms and pipelines for processing high contrast imaging data have been developed in the past thirteen years. Pueyo (2018) offers an ample discussion on the exoplanet detection algorithms proposed in the literature. The goal of this challenge is not only to compare, in a fair and robust way, existing post-processing algorithms but to spur the design of new techniques, spark new collaborations and ideas, and share knowledge. Ultimately, this will allow the community to maximize the scientific return of existing and future near-infrared HCI instruments.

Computer science and machine learning fields have a long tradition conducting data challenges and competitions. Repositories of benchmark (curated) datasets are an integral part of the field of machine learning. We want to integrate these practices to the field of high-contrast imaging. In the future, the process of testing new algorithms will be much straightforward and robust, once the community adopts the standard metrics (with their open-source implementations) and the benchmark library resulting from this challenge.

We plan to launch the data challenge mid April 2019, as soon as we finish ironing out the last details. In particular, we are considering the organization of a workshop on image processing for exoplanet direct imaging that could at the same time host a hackathon session for the participants of the data challenge. Please contact the organizers if you have questions, suggestions or are willing to provide support of any kind.


With the aim of creating a manageable competition, we will focus exclusively on the detection of point-like sources (exoplanets). Other tasks, such as the characterization of companions, the detection of extended sources, reference star differential imaging and the usage of metadata/telemetry, will be the subject of future editions of the challenge.

This first competition will consider two sub-challenges focused on the two most widely used observing techniques: pupil tracking (angular differential imaging, ADI) and multi-spectral imaging combined with pupil tracking (multi-channel spectral differential imaging, ADI+mSDI).

ADI: In pupil-stabilized or pupil-tracking observations the telescope pupil is stabilized on the detector and the field of view rotates in step with a given angle (the parallactic angle). This generates a fake movement of the companions in a circular trajectory around the center of the image, the place where the star is located. This process disentangles the exoplanet signal from the speckle field, an effect that is enhanced by the post-processing techniques. ADI datasets are composed of a 3d cube of images taken during an observing run (see Fig. 1) and their corresponding parallactic angles and non-saturated point-spread function (PSF).

ADI+mSDI: In the case of multi-spectral imaging, an integral field spectrograph is used to disperse the light, providing a data cube of several monochromatic images. The resolution and band coverage varies depending on the instrument used. Since the speckles are a function of wavelength, we can rescale the images to align them and create a fake movement of the companions in a radial direction (this diversity is also exploited by the post-processing algorithms). Multi-spectral imaging is usually combined with ADI in modern instruments. In this case, the datasets are composed of a 4d cube of images (see Fig. 1), a vector of parallactic angles, a vector of wavelenghts and a PSF.

Figure 1. Dimensionality of the cubes used in this challenge, depending on the observing technique: the left panel shows a single ADI data cube and the right panel shows an ADI + multi-channel SDI data cube.

While Referential Differential Imaging (RDI) is not a separate sub-challenge, we welcome submissions that make use of reference datasets or libraries (whether it is the cubes provided within the challenge or external ones). We only require the participant to make clear if they use RDI instead of pure ADI or ADI+mSDI post-processing (please read the sub-section named “Codalab”).

In Fig. 2 we show an schematic representation of the HCI blob detection pipeline. In the context of this data challenge, we will not use datasets with known companions. Instead, in order to measure the detection capability of different algorithms, we will inject synthetic planets or companions. Each challenge cube contains from none to five synthetic point-sources, injected using a standard process without accounting for smearing or variable photometry). For spectrally dispersed data we will use three template spectra when injecting the fake companions.

Figure 2. Schematic representation of the high-contrast imaging data processing pipeline, for the case of a LBTI/LMIRCam HR8799 data cube. Notice how from a data cube (or image sequence) we obtain one view of the star’s vicinity and an associated detection map where we could detect potential point-like sources.


A group of datasets has being compiled for the purpose of this challenge. The datasets come from a few different instruments: SPHERE (both IRDIS and IFS), GPI, NIRC2 and LMIRCam. This is to ensure the challenge library contains a diverse set of datasets coming from instruments with different characteristics: slow/high speed cameras, broadband/filtered sequences, small/large total rotation, presence of a coronagraph, etc.

In order to reduce the need of domain knowledge (e.g. the expertize related to high-contrast imaing or to a specific instrument), the datasets were calibrated/pre-processed using the standard pipelines of each instrument. Then we applied a few pre-processing procedures on each cube to make sure that the library is homogeneous. The characteristics of each dataset and the pre-processing procedures applied to them are recorded on a public Google spreadsheet.

Please keep in mind that we use these contributed datasets to inject synthetic companions and create the data challenge library. Once the data challenge is finished, the datasets without synthetic companions will constitute the benchmark HCI library.

For the sub-challenge on ADI post-processing, each dataset will be composed of:

  • instrument_cube_id.fits (3d array),
  • instrument_pa_id.fits (1d array, vector of parallactic angles),
  • instrument_pxscale_id.fits (float value, the pixel scale value in arc/px),
  • instrument_psf_id.fits (2d array, the associated PSF template),

where id is a positive integer and instrument is one of the following: nirc2, lmircam or sphere_irdis. For the second sub-challenge, on spectrally dispersed data (instruments: sphere_ifs or gpi), a 4D cube will be provided along with a vector of wavelengths (instrument_wls_id.fits).

The cubes will be cropped to focus on the innermost 20 lambda/D. The challenge files are saved as FITS files, a format with long tradition in astronomy. FITS files can be easily opened in any programming language or environment (Python, Matlab, IDL, R, C, etc). The most convenient software for quick visualization of the cubes is the SAOImageDS9 viewer which can be downloaded here. If you are into Python and you use Jupyterlab, then consider using the HCIplot open source library for visualizing the cubes.

It is mandatory that the submitted datasets remain secret for the duration of the challenge. After the data challenge is finished, the contributed datasets (without injected companions) will constitute the HCI benchmark library that will be made available for the community. This benchmark library will be stored on Zenodo, ensuring the long term preservation of data, and will serve the next generation of researchers who will be able to re-use the benchmark datasets for quick validation of novel algorithms.

Metrics and scoreboard

This challenge is focusing on the task of exoplanet direct detection. In order to measure the detection capability of different algorithms, we will rely on the injection of fake companions and the computation of several relevant metrics, such as the true positive rate or the number of false positives.


This challenge will consist of two consecutive phases (while the sub-challenges run in parallel), each one with its own type of submission and metrics. On the table below, you can find a summary of the metrics used for each phase:

  Sub-challenge 1: ADI Sub-challenge 2: ADI+mSDI
Phase Metric Metric
1 F1, TPR and FDR F1, TPR and FDR
2 ROC space ROC space

Phase 1: In the first phase, the expected output from an algorithm, i.e. the submission from a given participant, consists of a list of detection maps (nine for the first sub-challenge and ten for the second), a detection threshold (the accepted threshold at which a detection is claimed) and the FHWM values for each dataset. For example, for generating a submission file to the second sub-challenge (ADI+mSDI) you must include the following files: gpi_detmap_1.fits, gpi_detmap_2.fits, gpi_detmap_3.fits, gpi_detmap_4.fits, gpi_detmap_5.fits, sphere_ifs_detmap_1, sphere_ifs_detmap_2, sphere_ifs_detmap_3, sphere_ifs_detmap_4, sphere_ifs_detmap_5, gpi_fwhm_1.fits, gpi_fwhm_2.fits, gpi_fwhm_3.fits, gpi_fwhm_4.fits, gpi_fwhm_5.fits, sphere_ifs_fwhm_1, sphere_ifs_fwhm_2, sphere_ifs_fwhm_3, sphere_ifs_fwhm_4, sphere_ifs_fwhm_5, detection_threshold.fits.

By thresholding each detection map and counting the true and false positives, we define several metrics:

  • the true positive rate (TPR) also known as sensitivity or recall: TPR = TPs / Ninj,
  • the false discovery/detection rate (FDR): FDR = FPs / Ndet,
  • the precision or positive predictive value (PPV): PPV = TPs / Ndet,
  • the F1-score or harmonic mean of TPR and the precision: F1 = 2 * PPV * TPR / (PPV + TPR).

where TPs is the number of true positives/detections, FPs is the total number of false positives or Type I error, Ndet is the total number of detections (TPs + FPs) and Ninj is the total number of injections (accros all the datasets of a given sub-challenge). The TPs, FPs and Ndet are counted for a given participant/algorithm over all the datasets of a given sub-challenge. This blob counting procedure is implemented in the Vortex Image Processing package, specifically in the compute_binary_map function found here. Read below about the Data Challenge starting kit, which contains detailed explanations about the blob counting procedure.

Two scoreboards will be computed, one for the sub-challenge on ADI data (3D cubes) and one for the sub-challenge on ADI+mSDI cubes (4D cubes). The F1-score serves well our goal of assessing the performance of detection algorithms as binary classifiers, therefore we will use it to rank the entries on each scoreboard.

Note: Each submission must correspond to the results of applying the same algorithm to all the datasets. If your algorithm works for both 3D and 4D datasets then you need to make two submissions (to have your score on each scoreboard).

The contrast (brightness) value for injecting each synthetic companion will be estimated wrt a baseline algorithm. First, the S/N of a population of injected companions will be measured on residual final frames (processed with the baseline algorithm). Then, the interval of fluxes as a function of the separation from the star will be defined by checking which contrast corresponds to S/Ns in given interval (e.g. 1 to 4). This procedure is implemented here.

Phase 2: The community is converging on the usage of receiver operating characteristic (ROC) curves for the performance assessment of high-contrast imaging post-processing algorithms (see Jensen Clem et al. 2017). In Fig. 3 is displayed a compilation of some ROC curves from the high-contrast imaging literature.

Figure 3. ROC curves in the high-contrast imaging literature. Top-left from Gomez Gonzalez et al. 2016, top-right from Ruffio et al. 2017, bottom-left from Gomez Gonzalez et al. 2018 and bottom-right from Pueyo 2018.

In this phase, we will focus on the computation of ROC curves for comparing the trade-off of TPR and number of FPs for different algorithms as a function of the detection threshold. The ROC curve computation boils down to repeating the above procedure of injecting companions in the empty challenge datasets, computing detection maps, thresholding them and counting sources, ie. the detection state and the number of false positives for different detection criteria. This expensive procedure will be performed locally, therefore the source code of the algorithm (implemented on an open source language such as Python or R), an executable file or a Docker image will be required from the participants. Additionally, the participant must submit the detection thresholds for the thresholding and blob counting procedure.

Important: only the participants who agree to submit their code will be included in the second phase of this data challenge.

Starting kit

The implementations of the planet injection, source counting and ROC curve generation algorithms are included in the open source Vortex Image Processing package for the sake of transparency and fairness. Other code related to the data challenge is available at the Data Challenge Extras repository. Here we also share a Python script (DC1_codalab_starting_kit.py) that illustrates how to create the outputs to be submitted for participating in the data challenge.

The extras repository also contains a data challenge starting kit, in the form of a Jupyter notebook (DC1_starting_kit.ipynb), which explains in detail the source counting procedure on detection maps (to get the true and false positives). The starting kit can also be executed on the cloud thanks to Binder. This implementation shall serve as a reference for future studies on image processing algorithms for direct imaging of exoplanets.


The challenge is implemented on Codalab, a framework for accelerating reproducible computational research. The participants will enter the competition by creating their own account on Codalab and following the link of the challenge. The scoreboards will be updated automatically after each submission to the Codalab interface. The challenge interface will offer the option to include a short description of the algorithm you used for a each submission, please don’t forget to fill in this information (specially if the algorithm uses RDI). Participants may submit as many times as they want. The scoring routines that compute the metrics can be found on the Data Challenge Extras repository.

Organization team and collaborators

Core team


Below we list those who have participated in discussions or provided data for a given high-contrast imaging instrument:

  • Olivier Absil
  • David Mouillet
  • Dimitri Mawet
  • Jean-Baptiste Ruffio
  • Michael Bottom
  • Jordan Stone

Links to other astronomical data challenges

A non-exhaustive list of past and on-going astronomical data challenges can be found below.