Reproducibility: Natural killer cells act as an extrinsic barrier for in vivo reprogramming
paper doi: 10.1242/dev.200361
Overview
This paper’s (Melendez et al. 2022) efforts towards reproducibility are in line current norms and its conclusions seem reasonable. In this first part of our coverage of this paper we have some thoughts about how the statistics could potentially be improved, primarily the clarity and completeness with which they are communicated.
As usual our critique identifies places in which it would be possible to go above and beyond current norms in a way which improves the openness and reproducibility of the work. As is also our practice we speculate about broader recommendations to publishers and other institutional actors about policies that might help improve adoption of some of these practices by authors.
Open & FAIR materials processes & outputs
Pros
Cons
- Unlike sequencing data Imaging and flow data are not commonly deposited in public archives though these do exist:
- Imaging – BIA for anything published or IDR if they want to curate your dataset
- Flow - FlowRepository
- Most of the Flowcytometry analysis was performed with proprietary software, also a common issue, most analysis software for flow is proprietary
- They also have the very common problem of being reliant on proprietary antibodies the sequences of which are not publicly available.
- Lack of protocol level methodological detail – another common deficiency in the literature
Statistics
Reporting results of stats tests
Figure 2D and Figure 3D are examples where the authors state an ANOVA was used but they appear to be reporting multiple p-values by the asterixis on the plots. This suggests that they have done post-hoc tests on the ANOVA but not specified which post-hoc test or reported the overall ANOVA statistics. Which post hoc test was used and how and if multiple correction was performed should also be stated.
The authors used asterixis without providing the specific p-values either in the legend or in supplementary materials.
The authors stated that they computed a correlation coefficient in Figure 3D, there are a number of different correlation coefficients: Pearson’s r, Spearman’s rho, Etc. which one?
Multiple testing correction
Figure 1C states in the legend that unpaired two-tailed student’s t-test’s were used, but multiple tests are being performed here so it would be appropriate to perform multiple testing correction and this is not mentioned. The use of asterixis without also reporting the exact p-values makes it impossible to tell if multiple testing correction would have affected whether or not these comparisons would have reached significance following correction.
Opportunity to benefit from Pre-registration
It is quite clear from their introduction that they are expecting immune infiltration of the reprogramming tissues based on previous findings. So they are expecting an increase in the number of immune cells and are interested in which subtypes are increasing, meaning an interest in no or some increase for any particular cell type. This being the case if they had registered this prediction, they could likely have justified using a directional (one-tailed) t-test. Reviewers tend to question the use of one tailed tests, especially if they make the difference between crossing a significance threshold or not, as they are easy to abuse when performing an analysis after the fact. Registering a prediction in advance assuages any concerns about HARKing (hypothesising after the results are known) & p-hacking. You do not necessarily need a formal pre-registration process to do this just ability to provide a timestamped record of your prediction and planned analysis before you have performed your experiment. This can be done with a number of tools zenodo, OSF, octopus, Research Equals etc.
However, if they were interested in the possibility that some immune cell types actually decreased in reprogramming tissues then the non-directional (two-tailed) test remains the more appropriate choice.
Pairing in pooled results
Figure 1C & Figure 2B mention that they are results pooled from 2 independent experiments. This mixes technical and biological sources of variation in the data. Assuming each experiment has one or more of each group it is better to use pairing/matching to account for this possible source of technical structure in the data that’s not relevant for the biological question of interest.
The legend of Figure 7B states: “each dot represents one organoid from three biological replicates; ten images were measured per sample” this is a little ambiguous but would seem to suggest that they are also pooling results here resulting in a mix of biological and technical replicates.
Appropriateness of tests
Consider Figure 1C, which looks at the differences in the fraction of the cells that are of a given type in the pancreas under different conditions. Naively then one would expect the underlying data to have a binomial distribution as we are asking: ‘is this cell of type X yes or no?’, a fixed number of times for the number of cells sampled. So, in particular, they are not likely to meet the homogeneity of variance assumption of the t-test, as values close to 0 have low variance and values close to 0.5 have high variance in binomially distributed data.
The ‘propeller’ method was developed to address the particular question of testing for differences in cell type proportions in single cell data. This method was published after the publication of the paper we are covering here so the authors could not have made use of this particular approach, but we include it here for the future reference of anyone posing a similar question.
A far simpler method that would likely be an improvement over simply using a student’s t-test if not as good as the purpose developed method ‘propeller’, might be to use a Welch’s t-test, (AKA Welch’s correction), which accounts for differences in variation between conditions.
Other results with cell and colony count data may have similar issues with meeting the assumptions of the student’s t-test.
Consistency
Mean +/- sem is used in Figure 1 and mean +/- sd in all other figures, it is unclear why, and we can’t think of a good reason why one would be more appropriate than another in these contexts. All else being equal picking one of these indications of and using it consistently would be preferable.
Recommendations for publishers
Require Plot Source Data
Policies like this one from nature communications are a reasonable example.
Source data
For relevant manuscripts, we may request a source data file in Microsoft Excel format or a zipped folder. The source data file should, as a minimum, contain the raw data underlying any graphs and charts, and uncropped versions of any gels or blots presented in the figures. Within the source data file, each figure or table (in the main manuscript and in the Supplementary Information) containing relevant data should be represented by a single sheet in an Excel document, or a single .txt file or other file type in a zipped folder. Blot and gel images should be pasted in and labelled with the relevant panel and identifying information such as the antibody used. We also encourage authors to include any other types of raw data that may be appropriate. An example source data file is available demonstrating the correct format.
It would be better If they:
- Require this of all submissions
- Clarify “raw data underlying any graphs and charts”, this may be ‘secondary data’ i.e. the result of processing and analysis of raw data as gathered from an experiment. But It is ‘raw’ for the purposes of plotting, i.e. the data points which you actually plotted, be they primary or the output of a longer upstream analysis process. So that others could re-plot you graphs.
Require alt text for figures
The visualisation post that accompanies this one lacks alt text for its figures – mostly because this is quite time consuming to do well for academic figures, we have limited time, and none of the original figures that we started with (so far) came with any that we could adapt. Attempting to write the alt text for a figure though is an excellent exercise for bringing clarity to what you are trying to illustrate with that figure and the order in which it makes most sense to present this information. So despite the hypocrisy of not having provided alt text here we would suggest that journals require alt text for all figures, this by no means addresses the fairly abysmal accessibility of the literature to screen readers but is a step in the right direction. A much needed one given the results of this succinctly damning 2023 study: “Figure accessibility in journals: analysis of alt-text in 2021–23” - 10.1016/S0140-6736(23)02348-6 (https://doi.org/10.1016/S0140-6736(23)02348-6)
Note it would also be good to use image metadata standards to embed alt text into image files of figures themselves when sharing them online to increase portability (see: https://www.iptc.org/standards/photo-metadata/iptc-standard/).