Visualisation: Natural killer cells act as an extrinsic barrier for in vivo reprogramming

Visualisation
Authors
Affiliations

Richard J. Acton

Hayley Carr

Babraham Institute

Published

January 26, 2024

Modified

December 10, 2024

paper doi: 10.1242/dev.200361

In our previous post covering this paper (Melendez et al. 2022) we talked about openness and reproducibility, In this post we talk about data visualisation.

Done well

Semantically well-chosen colour palette in Figure 1 C. That is to say that the choice of colours generally reflects the meaning that they are trying to convey well.

How?

  • Grey for controls is nice, it is indicative that it’s the baseline thing against which - comparisons are being made
  • The colours they use for the effects that they expect to be stronger are darker and/or more saturated, Figure 1 C is a good example here:
    • Dark grey for the p55-null that was show previously to be more sensitive to reprogramming and darker and more saturated green for the induced reprogramming factors in combination with the p53-null.
    • Nitpick: Some colour shade consistency issues – Figure 3 A, for example, is a different green
  • The consistency of the meaning of green as i4F does fall down in later figures though. In figure 2 for example green is re-used for NK cells and transwells. If possible, we advise avoiding re-cycling colours to mean different things within the same context, that might be the paper but depending on specifics might be limited to a given multi-panel figure.

Re-working a figure

If you must use multi-panel figures outside of contexts where there is a meaningful grouping between small multiples, which unfortunately is generally the case, there are steps that can be taken to make them much less onerous to absorb. We decided to focus on figure 2 for this.

Note there was a minor off-by-one error in the text, figure 2 D was referred to a second time when figure 2 E was intended to be, (inferring from context), and this was carried forward to F) and G). Meaning that text about E) in the body was about F), and F) about G).

  • Sub-headings and blocking to delineate semantic groups. There are least 4 distinct sections of this multi-panel figure. This is partially illustrated through rows in the original, but it is not obvious at a glance, partly as A) violates this rule. Using lines and subheadings to clearly denote these separate groups makes this unambiguous.
    1. is a schematic of the experimental system common to all of the variants in the next three sections, but without some text from the body and the legend it is a bit hard to interpret. So some text is brought in here to give enough context not to need to refer back to the body/legend for basic interpretation.
    1. is a quantification of the data a subset of which is in C) so these should be swapped so that they appear in the right conceptual order. We’ve also added a wedge under the series of increasing NK:MEF ratios to provide a visual cue for this increase, mirroring the one that we’ve added to C).
  • The use of ‘E:T’ in B) is an unnecessary extra lookup in the text just use ‘NK:MEK’ as was done in C)
    1. Splits the series of increasing proportions of NK to MEK images over two lines and treats the 0:1 NF:MEF +Dox condition as the beginning of this sequence when it is perhaps better thought of as a control MEF +Dox. The reworked version groups the three controls and the three serial increases in NK by row. It also uses a wedge to visually hint at the increasing NKs.
    1. fails to group ligands with their corresponding receptors as noted in the text. This is useful context for interpreting subsequent experiments. So in the re-work we’ve swapped ICAM1 and RAE1 so that all NKG2D ligands are together and we’ve added labels indicating the receptors. This much more naturally leads to reader to ask, after looking at the next section, if blocking LFA also partially disrupts the effects of NK cells as blocking NKG2D does and if blocking both is closer to control levels.
  • E), F), & G) have no consistent entry point, it’s not clear where to start when you are reading them. The key information to know what each panel shows is not prominent but mixed in with other information. The red highlights on the original show the ‘entry point’ for each panel, they don’t stand out. To address this, that information has been added to a subheading for each panel along with what these interventions are intended to test. This way the reader is prompted to recall for example: ConA is blocking cytotoxic granules, and anti-NKG2D an NK activating receptor ligand interaction. This way they don’t need to refer back to the text for context if these are not relationships they are already familiar with.
  • We chose to go with the question being posed in each experiment as the section subheadings rather than the result, and also did not take the approach of asserting the conclusion as is typically done in the body of the text. This is a stylistic choice. Some might argue that this is ‘leading the witness’ but the reader is still perfectly free to dispute that the experiment addressed the question that the author has posed and suggests that it addresses. The reader is invited to infer the answer to the question from the data, representing a bit of a compromise with those who dislike declarative titles on plots. Indeed having the author state what question they think an experiment answers makes framing a critique that questions this premise all the more obvious.

Full size reworked version:

A good portion of this advice is to include more text in your figure panels to provide enough context for your graphics to be interpretable without extensive reference to both the body of the text and the legend. You may be able to take some text out of the legend and place it directly onto your figures to improve legibility – it can take quite a lot of hunting through a long figure legend to find the context you need for a given panel so just put it on the panel. Follow a principle of co-localising the minimal set of information necessary to the interpretation of a plot with the plot itself. This makes plots much easier to scan. For some interesting empirical work related to this point see: “Striking a Balance: Reader Takeaways and Preferences when Integrating Text and Charts” (Stokes et al. 2022)

If journal guidelines prevent better graphics practice in your multi-panel figures, complain and/or consider publishing somewhere else with better policies. The ability to have clarity of communication of your work when more people may be finding your paper by keyword search than finding it in the latest edition of a ‘high impact’ publication should not be given short shrift. Alternatively, do it right in the pre-print, treat that as your canonical version, and consider the published version as a marketing exercise.

Note the Inkscape .SVG file for this edited version of the figure is available in the source repository of this website, so if you want to see how this edit was made and experiment with it for yourself.

Figure 1 C

Code
suppressPackageStartupMessages({
    library(dplyr)
    library(tidyr)
    library(ggplot2)
    library(cowplot)
    library(colorspace)
    library(colorblindr)
    library(gt)
    library(ggbeeswarm)
    library(ggforce)
    library(ggridges)
})

sem <- function(x) sd(x) / sqrt(length(x))
Code
## ! values estimated by eye !
f1c_conditions <- c("WT","p53-null","i4F","i4F;p53-null")
f1c_cell_types <- c("CD45","B220","CD3","CD4","CD8","F4/80","CD11b","Gr1","NK","NKT")
f1c_data <- tibble::tibble(
    condition = rep(factor(f1c_conditions, levels = f1c_conditions, ordered = TRUE), each = 10),
    cell_type = rep(factor(f1c_cell_types, levels = f1c_cell_types, ordered = TRUE), 4),
    percent_cells_in_pancreas = list(
        c(0.5, 1.0, 1.2, 1.9, 2.0, 2.3, 3.0, 9.5, 9.5),
        c(0.1, 0.2, 0.4, 0.5, 0.7, 0.9, 1.1, 2.9, 3.0),
        c(0.1, 0.1, 0.2, 0.3, 0.3, 0.4, 0.5, 0.9, 1.8),
        c(0.1, 0.1, 0.2, 0.3, 0.3, 0.4, 0.4, 0.7, 1.0),
        c(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.5, 0.5),
        c(0.1, 0.2, 0.2, 0.2, 0.3, 0.4, 0.5, 0.5, 0.5),
        c(0.1, 0.1, 0.2, 0.2, 0.4, 0.5, 0.6, 0.7, 0.8),
        c(0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.4, 0.4, 0.4),
        c(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.1),
        c(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.1),
        c(1.0, 2.1, 2.4, 3.6, 10.),
        c(0.3, 0.2, 1.0, 1.2, 3.0),
        c(0.5, 0.5, 0.8, 1.1, 2.0),
        c(0.1, 0.2, 0.3, 0.8, 1.0),
        c(0.0, 0.1, 0.2, 0.5, 0.6),
        c(0.4, 0.8, 0.9, 1.5, 1.9),
        c(0.2, 0.4, 0.8, 1.5, 1.9),
        c(0.2, 0.5, 0.7, 1.5, 1.8),
        c(0.0, 0.1, 0.2, 0.6, 0.8),
        c(0.0, 0.0, 0.0, 0.1, 0.2),
        c(0.4, 0.9, 14., 15., 18., 20.),
        c(0.0, 0.6, 2.1, 8.0, 8.0, 9.0),
        c(0.0, 0.6, 1.3, 2.0, 2.4, 3.5),
        c(0.0, 0.1, 0.5, 1.4, 1.5, 1.9),
        c(0.0, 0.1, 0.3, 0.7, 0.8, 0.9),
        c(0.3, 0.8, 1.2, 2.7, 3.2, 9.5),
        c(0.2, 1.0, 3.0, 3.5, 8.0, 9.0),
        c(0.1, 0.2, 0.3, 0.9, 2.1, 10.),
        c(0.0, 0.1, 0.1, 1.1, 1.2, 8.0),
        c(0.0, 0.0, 0.1, 0.1, 1.6, 3.0),
        c(13., 30., 38., 40.),
        c(1.0, 3.5, 7.0, 12.),
        c(1.4, 2.0, 2.2, 2.6),
        c(0.2, 0.9, 1.1, 3.1),
        c(0.1, 0.5, 0.7, 2.1),
        c(8.0, 10., 18., 31.),
        c(4.5, 8.0, 10., 12.),
        c(1.2, 9.0, 13., 18.),
        c(1.3, 3.6, 9.0, 11.),
        c(0.6, 0.7, 4.6, 10.)
    )
) %>% tidyr::unnest(cols = c(percent_cells_in_pancreas))

f1c_fills <- c("#D4D4D4", "#808080", "#C1E0C0", "#7FAD7F")
names(f1c_fills) <- f1c_conditions

f1c_colours <- c("#959595","#454545","#017F01","#005900")
names(f1c_colours) <- f1c_conditions

Approximate Recreation of figure 1 C

Data used here are from eyeballing the points on the graph as the underlying data were not provided. Note the lack of an axis break. These are hard to specify in ggplot, in part because they are usually a bad idea.

Code
f1c_repro <- f1c_data %>% {
    ggplot(., aes(x = condition)) + 
    geom_col(
        data = . %>% 
            group_by(cell_type, condition) %>% 
            summarise(mean = mean(percent_cells_in_pancreas), .groups = "drop"),
        aes(y = mean, fill = condition, colour = condition),
        width = 0.6
    ) +
    scale_fill_manual(values = f1c_fills) + 
    ggbeeswarm::geom_beeswarm(
    #geom_jitter(
        aes(y = percent_cells_in_pancreas, colour = condition),
        #width = 0.2
        show.legend = FALSE
    ) +
    scale_color_manual(values = f1c_colours) + 
    geom_errorbar(
        data = . %>% group_by(cell_type, condition) %>% 
            summarise(
                mean = mean(percent_cells_in_pancreas),
                sem = sem(percent_cells_in_pancreas),
                ymax = mean + sem,
                ymin = mean - sem,
                .groups = "drop"
            ),
        width = 0.2,
        show.legend = FALSE,
        aes(ymin = ymin, ymax = ymax, colour = condition)
    ) +
    facet_wrap(~cell_type, nrow = 1, strip.position = "bottom") +  # , strip.position = "bottom"
    theme_minimal() + 
    theme(
        legend.position = "top",
        strip.placement = "outside",
        axis.text.x = element_blank(),
        axis.line = element_line(), axis.ticks = element_line(), 
    ) + 
    #lims(y = c(0, 45)) +
    # scale_y_log10() +
    labs(y = "% cells in pancreas", x = NULL)
}
f1c_repro

We can zoom in on a range of the y axis if we need to see the small differences at the lower range of values

Code
f1c_repro + coord_cartesian(ylim = c(0,5))

Colour pallet choice appears quite accessible.

Code
f1c_repro %>% colorblindr::cvd_grid()

figure 1 C in boxplot form

Using a boxplot instead of a bar chart for this plot highlights the problem of heterogeneity of variance between the conditions being compared that we mentioned in the previous post on this paper.

Code
f1c_box <- f1c_data %>% {
    ggplot(., aes(x = condition)) + 
    ggbeeswarm::geom_beeswarm(
    #geom_jitter(
        aes(y = percent_cells_in_pancreas),
        # aes(y = percent_cells_in_pancreas, colour = condition),
        shape = 4
        #width = 0.2
    ) +
    geom_boxplot(
        aes(y = percent_cells_in_pancreas, fill = condition, colour = condition),
        width = 0.5,
        outlier.shape = NA, alpha = 0.7
    ) + 
    scale_fill_manual(values = f1c_fills) + 
    scale_color_manual(values = f1c_colours) + 
    # geom_hline(yintercept = 0) +
    facet_wrap(~cell_type, nrow = 1, strip.position = "bottom") + 
    theme_minimal() + 
    # theme_bw() +
    theme(
        #strip.background = element_rect(fill = "grey", colour = NA),
        legend.position = "top",
        strip.placement = "outside",
        axis.text.x = element_blank(),
        #axis.line.y = element_line(), axis.ticks.y = element_line()
        axis.line = element_line(), axis.ticks = element_line(), 
        #axis.line.y = element_line(), axis.ticks = element_line(), 
    ) + 
    # lims(y = c(0, 10)) +
    # scale_y_log10() +
    labs(y = "% cells in pancreas", x = NULL)
}
f1c_box

Code
f1c_box  + coord_cartesian(ylim = c(0, 5))

Session Info

Code
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] ggridges_0.5.3    ggforce_0.3.3     ggbeeswarm_0.6.0  gt_0.10.1        
 [5] colorblindr_0.1.0 colorspace_2.0-3  cowplot_1.1.1     ggplot2_3.3.6    
 [9] tidyr_1.2.0       dplyr_1.1.4      

loaded via a namespace (and not attached):
 [1] utf8_1.2.2        generics_0.1.2    renv_0.15.5       xml2_1.3.6       
 [5] stringi_1.7.6     digest_0.6.29     magrittr_2.0.3    evaluate_0.15    
 [9] grid_4.3.1        fastmap_1.1.0     plyr_1.8.7        jsonlite_1.8.0   
[13] purrr_0.3.4       fansi_1.0.3       scales_1.3.0      tweenr_1.0.2     
[17] cli_3.6.2         rlang_1.1.0       polyclip_1.10-0   ellipsis_0.3.2   
[21] munsell_0.5.0     withr_2.5.0       yaml_2.3.5        tools_4.3.1      
[25] vctrs_0.6.5       R6_2.5.1          lifecycle_1.0.3   stringr_1.4.0    
[29] htmlwidgets_1.6.4 vipor_0.4.5       MASS_7.3-60       pkgconfig_2.0.3  
[33] beeswarm_0.4.0    pillar_1.9.0      gtable_0.3.0      glue_1.6.2       
[37] Rcpp_1.0.8.3      xfun_0.38         tibble_3.2.1      tidyselect_1.2.0 
[41] knitr_1.39        farver_2.1.0      htmltools_0.5.7   labeling_0.4.2   
[45] rmarkdown_2.14    compiler_4.3.1   

References

Melendez, Elena, Dafni Chondronasiou, Lluc Mosteiro, Jaime Martínez de Villarreal, Marcos Fernández-Alfara, Cian J. Lynch, Dirk Grimm, et al. 2022. “Natural Killer Cells Act as an Extrinsic Barrier for in Vivo Reprogramming.” Development 149 (8). https://doi.org/10.1242/dev.200361.
Stokes, Chase, Vidya Setlur, Bridget Cogley, Arvind Satyanarayan, and Marti A. Hearst. 2022. “Striking a Balance: Reader Takeaways and Preferences When Integrating Text and Charts.” IEEE Transactions on Visualization and Computer Graphics, 1–11. https://doi.org/10.1109/tvcg.2022.3209383.

Reuse