Artifact Evaluation
Reproducibility of experimental results is crucial to foster an atmosphere of trustworthy, open, and reusable research. To improve and reward reproducibility, QEST+FORMATS 2024 includes a dedicated Artifact Evaluation (AE). An artifact is any additional material (software, data sets, machine-checkable proofs, etc.) that supports the claims made in the paper and, in the ideal case, makes them fully replicable. In case of a tool, a typical artifact consists of the binary or source code of the tool, its documentation, the input files (e.g., models analyzed or input data) used for the tool evaluation in the paper, and a configuration file or document describing the parameters used to obtain the results.
Submission of an artifact is mandatory for tool papers, and optional – but encouraged – for research papers if it can support the results presented in the paper. Artifacts will be reviewed concurrently to the corresponding papers. The results of the artifact evaluation will be taken into consideration in the paper reviewing discussion. However, the primary goal of the artifact evaluation is to give positive feedback to authors as well as encourage and reward replicable research.
Benefits for Authors: By providing an artifact supporting experimental claims, authors increase the confidence of readers in their contribution. Accepted papers with a successfully evaluated artifact will receive a badge to be included on the paper’s title page. Finally, artifacts that significantly exceed expectations may receive an Outstanding Artifact Award.
Important Dates
All dates are AoE
- Artifact submission deadline: April 22, 2024
- Review phase I: April 22-26, 2024
- Author response period: April 26 – May 1, 2024
- Review phase II: May 1-18, 2024
- Author notification (both paper and artifact):
May 31, 2024
June 4, 2024
Phases are explained below.
Evaluation Criteria
The goal of this initiative is to encourage research that is openly accessible and reproducible also in the future (time-proof). The AE Committee will assign a score to the artifact based on the notion of reproducibility as detailed in the ACM badging policy. The AEC will focus on both “Functional” and “Available” aspects by evaluating:
- consistency with the results in the paper, and their replicability (are the observations consistent with the paper?),
- completeness (which proportion of the experiments in the paper can be replicated?),
- quality of documentation and easiness of use (can non-experts produce and interpret the results?),
- availability (can the artifact be publicly accessed?),
- future-proofness (is it reasonable to assume that the results can be still be reproduced in five years time?).
For example, artifacts that need to download third-party material from a private sharing link (e.g., a dropbox link or a private webpage) will not be considered future-proof.
Evaluation Process
The artifact evaluation is single blind. This in particular means that the artifact does not need to be anonymized.
The evaluation consists of two phases: the smoke-test phase (Phase I) and the full-review phase (Phase II), which proceed as follows.
-
Phase I: Reviewers will download artifacts, read the instructions, and attempt to run some minimal experiments. They will not try to verify any claims of the paper, but will merely check for technical issues. Any arising issues will be communicated to the authors, and authors may update their submission to fix these problems.
-
Phase II: The submissions are closed to the authors and the actual reviewing process begins. In case of unexpected technical problems, authors might be contacted by the AEC chair.
Submission
An artifact submission consists of
- an abstract summarizing the artifact and its relation to the paper (in particular, which experiments can be reproduced, and why the others cannot be reproduced, if any),
- additional information, including
- the platform on which the artifact has been prepared and tested,
- how much time the overall evaluation takes (roughly),
- special hardware requirements (if any),
- whether network access is required by the artifact, and
- any further information deemed relevant,
- the .pdf of the submitted paper, and
- a link to the actual artifact to be reviewed.
Submissions shall be created through easychair at this link.
The Artifact Itself
In the spirit of reproducibility and future-proofness, some requirements are imposed on the actual artifact. In case any of these points cannot be implemented, e.g., due to the use of licensed software that cannot be distributed, please contact the AEC chairs as soon as possible to discuss specific arrangements.
The artifact must contain the following.
- A README file, describing in clear and simple steps how to install and use the artifact, and how to replicate the results in the paper.
- If applicable, the README file should provide a “toy example” to easily check the setup during Phase I.
- In case network access is required by the artifact, an explanation of when and why it is required should be provided.
- A LICENCE file, which at the very least allows the AEC to download and execute the artifact.
- The concrete binaries as either a docker image (preferred) or VM image, containing everything that is needed to run the artifact.
- For Docker: Include the complete image saved with
docker save
(potentially compressed with, e.g.,gzip
). - For VM: Use VirtualBox and save the VM as Open Virtual Appliance (OVA) file.
Including instructions and sources to build the tool is strongly encouraged, but does not replace providing the complete image.
- For Docker: Include the complete image saved with
The artifact must be made available through archive-quality storage (Zenodo, Figshare, etc.) that provides a citable DOI.
The artifact must not require financial cost to be evaluated (e.g., by running on a cloud service).
It is recommended (but not required)
- to provide the logs and other files used to obtain the results of the paper,
- to include push button scripts to simplify the artifact evaluation process,
- that the artifact is self-contained and does not require network access,
- to run independent tests before the submission, and
- for docker-based submissions to include concrete bind-mounts (e.g.
-v $(pwd)/result:/results:rw
) in the README instructions to simplify extracting all results.
In general, it should be as simple as possible for reviewers to conclude reproducibility.
Sources and Reusability
Authors are also encouraged to include all sources, dependencies, and instructions needed to modify and/or (re-)build the artifact (e.g., through tarballs and a Dockerfile
). This may, of course, rely on network access. We recommend to strive to be as self-contained as possible. In particular, the artifact should contain as much of its dependencies as reasonably possible, and any downloads should only refer to stable sources (e.g., use a standard Debian docker image as base) and precise versions (e.g., a concrete commit / tag in a GitHub repository or docker image version instead of an unspecific “latest”). This maximizes the chances that the tool can still be modified and built in several years.
Badges
As indicated, papers with successful evaluation will place a badge on their title page for the camera-ready version. Note that the paper must clearly indicate the exact artifact version used for the evaluation, i.e. the DOI used for the submission. Of course, the paper may additionally link to the most recent version and/or source code repositories.
Sample LaTeX code to place the badge will be provided.
Outstanding Artifact Award
AEC members will nominate artifacts that significantly exceed expectations for an outstanding artifact award. The AEC chairs will consider these nominations, and might award none or multiple artifacts. Awardees will receive a certificate during the social event.