Tools for Reproducible Research in Python

As a community, ensuring that any research we do is reproducible should be a core tenet. In the machine learning community, it is becoming increasingly common for paper authors to release code alongside their paper. Such actions not only makes research more transparent, but also enables the wider field to accelerate faster due to reduced friction incurred when trying to benchmark prospective methods against existing approaches. In this spirit, this article outlines a series of tools and processes that make reproducible research an effortless task.

Github: Hosting code on git enables version control and fully transparent workflows. With any published research, accompanying code should be hosted on Github with an accompanying README file and example file(s).

Environment files In your Github repository, be sure to supply either a requirements.txt or environment.yml file for Conda that lists all dependency and the corresponding version number where appropriate. This allows users to create a virtual environment using something like VirtualEnv or Conda and run your code without screwing up their existing installation.

Sphinx: Documenting code is a little like tidying your bedroom as a child: you know you should do it, but it’s tedious and seemingly irrelevant to your own happiness. Using Sphinx will transform your docstrings into beautiful HTML documentation that will make it easy for prospective users of your code to navigate your codebase. The GPJax documentation is one example of this.

Docstring Generator: For VSCode users, the Docstring Generator plugin will generate template docstrings using the scope of the respective object or function.

PyTest: Writing units will make your code more robust to bugs and also provide prospective users with primitive usage examples. PyTest gives a convenient framework for writing clear unit tests with minimal boilerplate.

CodeCov: Once you have some unit tests, CodeCov will automatically calculate what percentage of your code is covered by unit tests and highlight any lines which are not covered. Bonus: the entire process of running unit tests and uploading the results to CodeCov can be achieved using Github actions - guide here.

Data: If your dataset is small (<~1MB), then include the raw file in the repository. For larger, third party datasets, provide a link to the exact copy of the data you have used and for large datasets that you have curated, host on a platform such as Github LFS. In your experimental code, create a file named that transforms the raw dataset into the preprocessed version that you have used in your experiments.

Naming conventions: Simple naming conventions will massively reduce the barrier for entry when first using your code. Some helpful tips are

  1. Use full words for object, function and variable names
  2. Create one experiment file per figure in your paper and name accordingly e.g., will generate Figure 1 in your paper.
  3. For multiple experiment files, number the files according to the level of involvement. For example, a simple experiment that shows a small, simple introductory example of your work you would name the file, whilst for the final (of five), most involved example that works on real data the filename would be used.

Watermark: Adding a watermark to the end of each notebook will explicitly detail the versions of each imported library that was used. The convention used in GPJax is the command %watermark -n -u -v -iv -w -a '<AUTHOR_NAME>' that prints the date, the notebook’s last update date, the specific Python version, all imported modules, the watermark version and author’s name.

Installation files: You need not upload your repository to PyPi, but including a working file in your repository so that users can locally install your code is helpful.