Reproducibility: Single-cell transcriptome profiling of the human developing spinal cord reveals a conserved genetic programme with human-specific features

Reproducibility
Authors

Richard J. Acton

Jamie McLeod

Published

November 20, 2023

Modified

December 7, 2023

paper doi: 10.1242/dev.199711

Note that due to limited time an resources we are only covering a portion of this paper. We focused on the data contributing to figure 1 and some general observations from reading the rest of the paper.

Figure 1 - Single-cell RNA-seq from the human developing spinal cord

Figure 1 - Single-cell RNA-seq from the human developing spinal cord

Overview

We had an overall positive impression of this paper’s efforts towards reproducibility, but we do have some suggestions for ways in which it could be improved above and beyond current common practice and journal guidelines.

Done Well

  • FAIR data
    • Identifier for deposited raw sequencing data GSE171892
    • Provision of analysis code human_single_cell
    • Making the processed data accessible for exploratory visualisation via a Shiny application was very nice to see neuraltube
  • Openness
    • Open Peer Review History
    • Manuscript is Openly Licensed CC BY 4.0
    • Use of CRediT in author contributions
    • Use of largely open source software in data analysis

Opportunities for Improvement

  • Analysis code repository
    • Documentation and organisation The project README file could benefit from some additional details. A description of the project structure, where to find code, where to find data, the order in which to run scripts, any external data dependencies or assumptions about the environment in which the code it to be run, which code produced which outputs in the final paper. Adding the figure numbers to the code which produced or contributed to a figure, once the figures are finalised, is convenient to help people navigate the contents of your repo but sections should also have intelligible names. It is also useful to make note of the approximate computational resources (CPU/RAM/Storage space etc.) that are required to run the analysis.
    • Avoid use of absolute paths A number of the scripts in this paper’s code repository contain absolute paths. This is a common source of errors when attempting to run others’ analysis code (Trisovic et al. 2022). Wherever possible it is best to make use of relative paths to materials within the project directory that will also be shared with the rest of the repo. If this is not possible it is best to paramaterise the absolute path and clearly document what is expected to be at that location ideally with a minimal example file or reference to one
    • Provide version and dependency information This project has a mix of R and python code. R provides the sessionInfo() function, minimally it is helpful to include the output from this function as a file in an R project Preferably manage your R packages and environment with {renv} and include an renv.lock file which details versions of all the R packages used and their dependencies. Providing a python virtual environment and using pip freeze > requirements.txt to specify the required packages is helpful. Other options include providing a container description file such as a docker file, conda environment, a Nix flake, or GUIX manifest. These latter options also allow specifying system dependencies as well as language specific ones.
    • Appropriately licensing the code The project repository does not contain a suitable license, by default under copyright law this means that the authors reserve all rights to its contents and it cannot be redistributed, with or without modification without the authors’ permission. The Turing Way Section on Licensing is a good place to start for researcher wanting to know how and why to license their code and other research outputs.
    • A note on parallel processing in R It is generally preferable to only use futureverse API parallelism in R to do things like interacting with with an HPC scheduler such as slurm via future.batchtools as this increases portability allowing people to substitute their own plan() on different parallel back-ends.
    • Making code citable with a DOI A github repository link is potentially subject to link rot. When sharing a reference to code it is advisable to generate a DOI for that code and create a permanent snapshot of it using a service like zenodo. Fortunately zenodo has easy integrations with github, see: how to use a CITATION.cff file in your repo and the github documentation for zenodo integration, for advanced metadata and how to do this with gitlab see here.
  • Depositing raw imaging data
  • Protocol level methodological detail
    • Methodological detail in the methods of this paper for the single cell transcriptomics, on which we focused, and of the other paper cited for methods (Delile et al. 2019) seemed to us perfectly reasonable though none of us had first hand experience of using closely related methods at the bench. Proving and or citing protocol level detail about the methods used through venues such as protocols.io, protocol exchange, or indeed any repository that will mint you a DOI for your protocol adds significantly to the reproducibility of your work. One of the most alarming findings of the Reproducibility Project: Cancer Biology was the lack of sufficient methodological detail to reproduce work without the assistance of the original authors. The ambiguities necessary in order to summarise an experimental protocol in a methods section add up.
    • Whilst the Fiji macro used is cited a version is not stated, this however appears to be because these scripts are not versioned by their distributors. When version information is not available this is worth noting. On the positive side the script authors do provide example data. A very general statement is made about the plotting, ‘data were plotted in R’, as with other analyses; versions, dependencies, and scripts would ideally be included.
  • Where to download processed data
    • Whilst easily viewed via the Shiny application where to download a copy of processed single cell data was not obvious, including these data objects in the GEO submission, or placing them in another repository would be helpful.

The analysis code repository section has the largest number of specific suggestions for improvement as this is still a relatively young area and best standards and practices are yet to be be commonly taught and adopted for sharing data of this type. This focus also reflects our areas of expertise. It is an area which benefits from familiarity with additional tooling for things like version control, as well as reproducible package and environment management. These must be learned and be integrated into project workflow, thus they present additional hurdles adoption.

Pleasingly almost all analysis in this paper was performed using open source software tools. Note however that ‘Cell Ranger’ is a piece of proprietary software from 10x genomics, it is however what is know as source available (Community 2022), see the: license, meaning its source code is available for inspection. This is preferable to closed-source proprietary tools but less desirable than fully open ones.

Slides

Slides from the introductory session.

References

Community, The Turing Way. 2022. The Turing Way: A Handbook for Reproducible, Ethical and Collaborative Research. Zenodo. https://doi.org/10.5281/ZENODO.3233853.
Delile, Julien, Teresa Rayon, Manuela Melchionda, Amelia Edwards, James Briscoe, and Andreas Sagner. 2019. “Single Cell Transcriptomics Reveals Spatial and Temporal Dynamics of Gene Expression in the Developing Mouse Spinal Cord.” Development, January. https://doi.org/10.1242/dev.173807.
Sarkans, Ugis, Wah Chiu, Lucy Collinson, Michele C. Darrow, Jan Ellenberg, David Grunwald, Jean-Karim Hériché, et al. 2021. “REMBI: Recommended Metadata for Biological Imagesenabling Reuse of Microscopy Data in Biology.” Nature Methods 18 (12): 1418–22. https://doi.org/10.1038/s41592-021-01166-8.
Schmied, Christopher, Michael S. Nelson, Sergiy Avilov, Gert-Jan Bakker, Cristina Bertocchi, Johanna Bischof, Ulrike Boehm, et al. 2023. “Community-Developed Checklists for Publishing Images and Image Analyses.” Nature Methods, September. https://doi.org/10.1038/s41592-023-01987-9.
Trisovic, Ana, Matthew K. Lau, Thomas Pasquier, and Mercè Crosas. 2022. “A Large-Scale Study on Research Code Quality and Execution.” Scientific Data 9 (1). https://doi.org/10.1038/s41597-022-01143-6.