Demandt et al. - Stickleback groups and parasite infection


I’ll start of these blog posts (see here for a more general introduction) with an interesting paper that recently came in Proceedings B, by Nicolle Demandt, Benedikt Saus and others. Briefly, the study looks at how the behavior of stickleback fish changes when animals get infected with tape worms. Now we already know that some parasite infections can change behavior, for example by making animals less risk-averse. But in this paper the authors also look at how the behavior of non-infected fish changes when others in the group are infected.

The experiment involves infection some fish with tape worms, while other fish are kept healthy. The authors then let them swim together in groups of six fish, with different numbers of healthy or infected fish. This results in four treatments: six uninfected sticklebacks (6u), four uninfected and two infected sticklebacks (4u/2i), two uninfected and four infected sticklebacks (2u/4i) and six infected sticklebacks (6i).

They then expose these groups to a simulated bird attack, and measure their behavior before, during and after the attack.

For this post, I will just focus on the result presented in figure 2, which involves the escape from the simulated attack. The other analysis about time usage is basically very similar. This statement from the abstract is the topic here:

“… individuals in groups with only infected members showed lower escape responses and higher risk-taking than individuals from groups with only uninfected members. Importantly, uninfected individuals adjusted their risk-taking behaviour to the number of infected group members, taking more risk with an increasing number of infected group members.”

You can read the full open-access paper here.

Short conclusion

I chose a different analysis strategy, using a single model instead of four. Regardless, the results are robust against this choice, and to me it seems that the conclusions hold.

Overall, the data was easily available (although in .xlsx format), and the figures and analyses were easy to reproduce!

I did find the figures to be not so great, and I would have made some adjustments.


Packages and data

I’ll start by loading some packages for general convenience and mixed modelling.


The data is available on dryad, and with the follow code we can easily obtain the data, straight from their servers. It’s an excel file (I’d prefer to see .csv), so we’ll use the readxl package to read it in.

url <- ''
tmp <- httr::GET(url, httr::write_disk(tf <- tempfile(fileext = ".xlsx")))
d <- readxl::read_excel(tf)

This works fine, but the column names are long and inconsistent, and not everything is coded with the same levels as the figures. So I start of by renaming and relabeling the data.

d2 <- d %>% 
    group = factor(GroupID),
    fish = factor(FishID),
    treatment = factor(TreatmentgroupMS, levels = c('6u/0i', '4u/2i', '2u/4i', '0u/6i'),
                       labels = c('6u', '4u/2i', '2u/4i', '6i')),
    infected = factor(Infected, c('no', 'Yes'), c('uninf', 'inf')),
    time_dangerous = `Time spend in dangerous zone (sec)`,
    period = factor(`BeforeOrAfter bird strike`, c('Before', 'After'), c('before', 'after')),
    escape_zone = factor(`Escape zone`, c('Safe zone', 'Dangerous zone'), c('safe', 'dangerous'))

Let’s have a quick look:

## # A tibble: 336 x 7
##    group fish  treatment infected time_dangerous period escape_zone
##    <fct> <fct> <fct>     <fct>             <dbl> <fct>  <fct>      
##  1 3     15    6u        uninf               534 before dangerous  
##  2 3     15    6u        uninf               457 after  dangerous  
##  3 7     37    6u        uninf               371 before dangerous  
##  4 7     37    6u        uninf                99 after  dangerous  
##  5 7     42    6u        uninf               335 before dangerous  
##  6 7     42    6u        uninf                66 after  dangerous  
##  7 1     1     6u        uninf               576 before safe       
##  8 1     1     6u        uninf               118 after  safe       
##  9 1     2     6u        uninf               580 before safe       
## 10 1     2     6u        uninf               172 after  safe       
## # ... with 326 more rows

Overall, this looks good! One thing to look out for, is that the recorded escape zone (where each fish went during the simulated attack) is repeated twice, since there are before and after periods for each fish. I’ll create another data.frame that only has one observation per fish:

d3 <- filter(d2, period == 'after')

Analysis: escape zone

Figure 2 in the paper looks at whether the fish escaped towards the safe lower part of the tank during the simulated attack, or whether they remained exposed by staying in the top section where they would be reachable by the bird predator.

We can quite easily recreate figure 2:

ggplot(d3, aes(infected, escape_zone, fill = infected, group = infected)) +
  geom_violin(scale = 'count')  +
  geom_jitter(height = 0, width = 0.2, alpha = 0.3) +
  facet_grid(~treatment) +
  scale_y_discrete(expand = c(0.02, 0)) +
  scale_fill_manual(values = c('white', 'grey50')) +
  labs(x = 'infection status', y = 'escape zone', fill = NULL) +
  theme(legend.position = 'bottom')

However, I think this figure is less than ideal. Violin plots don’t make a lot of sense here. They are meant to show continuous distributions, but here they are applied to binary data. It is also hard to see what the average response would be. I would probably rather use simple means and confidence intervals. One could calculate the binomial confidence interval (binom.test gives them, for example), but bootstrapped confidence intervals will also do, and are really easy with ggplot2.

I also think it is unnecessary to show the non-existing groups (i.e. there are no infected fish in the 6u treatment).

ggplot(d3, aes(infected, as.numeric(escape_zone) - 1, group = infected)) +
  stat_summary( = 'mean_cl_boot', size = 1.5, fatten = 2.5) +
  facet_grid(~treatment, scales = 'free_x', space = 'free_x') +
  ylim(0, 1) +
  labs(x = 'infection status', y = 'fraction with dangerous escape')

In this figure, for example, it is easier to see that the infected and non-infected individuals in the 2u/4i treatment respond very similarly.

Alternatively, we could do away with facets entirely. Although it does not matter much, I think.

ggplot(d3, aes(treatment, as.numeric(escape_zone) - 1, color = infected)) +
  stat_summary( = 'mean_cl_boot', size = 1.5, fatten = 2.5, 
               position =  position_dodge(width = 0.3)) +
  scale_color_manual(values = c('black', 'grey50')) +
  ylim(0, 1) +
  labs(x = 'group composition', y = 'fraction that lacked escape', color = 'infection status')

So clearly, to analyze the escape behavior we will need a binomial model (there are two possible escape zones). And we will need to include a random effect for group id since multiple fish per group were observed, but they are not independent. So a generalized mixed model it is.

fm1 <- glmer(escape_zone ~ treatment * infected + (1 | group), d3, 'binomial')
## fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients

This model generates a message about dropping two coefficients. This makes sense, the effect of infection status cannot be estimated in the groups of only infected or non-infected fish.

We should first assess whether the data supports the inclusion of the interaction term. I do that by fitting the model without the interaction, and comparing AICs.

fm2 <- update(fm1, . ~ . -treatment:infected)
AIC(fm1, fm2)
##     df      AIC
## fm1  7 175.3231
## fm2  6 178.7194

The model with the interaction is about 3.5 points better, so we’ll keep it in. Indeed the F values for all terms are reasonably high:

## Analysis of Variance Table
##                    Df  Sum Sq Mean Sq F value
## treatment           3 13.4486  4.4829  4.4829
## infected            1  3.4544  3.4544  3.4544
## treatment:infected  1  5.0999  5.0999  5.0999

There are no p-values given here, but p-values are available for the individual coefficients, see coef(summary(fm1)).

So the escape behavior depends on both the infection status and group composition Two immediate follow-up questions would be if infected fish decrease their risk-taking when grouped with healthy fish, and if healthy fish increase their risk when swimming with infected fish.

This can be answered by looking at some post-hoc comparisons. Specifically we want to look at the contrast between group composition, within each infection status. The emmeans package is super useful for this.

emmeans(fm1, pairwise ~ treatment | infected, adjust = 'holm')$contrast
## infected = uninf:
##  contrast        estimate       SE  df z.ratio p.value
##  6u - 4u/2i    -0.1797502 1.382841 Inf  -0.130  0.8966
##  6u - 2u/4i    -3.8899251 1.382848 Inf  -2.813  0.0147
##  6u - 6i           nonEst       NA  NA      NA      NA
##  4u/2i - 2u/4i -3.7101750 1.497699 Inf  -2.477  0.0265
##  4u/2i - 6i        nonEst       NA  NA      NA      NA
##  2u/4i - 6i        nonEst       NA  NA      NA      NA
## infected = inf:
##  contrast        estimate       SE  df z.ratio p.value
##  6u - 4u/2i        nonEst       NA  NA      NA      NA
##  6u - 2u/4i        nonEst       NA  NA      NA      NA
##  6u - 6i           nonEst       NA  NA      NA      NA
##  4u/2i - 2u/4i -0.6308713 1.181557 Inf  -0.534  1.0000
##  4u/2i - 6i    -0.2129252 1.132152 Inf  -0.188  1.0000
##  2u/4i - 6i     0.4179462 1.020598 Inf   0.410  1.0000
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: holm method for 6 tests

Note that we get NA values for the comparisons that are invalid (because of our rank-deficient model). That’s a bit of an issue here, since the standard Tukey method will control for a test too many. So I’m using the Holm method.

We can clearly see that the infected fish don’t change their behavior with group composition, but the infected fish have significantly changed their behavior when the majority of the group is infected.

I conclude that the statement “uninfected individuals adjusted their risk-taking behaviour to the number of infected group members, taking more risk with an increasing number of infected group members” seems to be true.

The authors also said: “individuals in groups with only infected members showed lower escape responses and higher risk-taking than individuals from groups with only uninfected members.” We can obtain this comparison from the same model as well:

emmeans(fm1, pairwise ~ treatment + infected, adjust = 'none')$contrast %>% %>% 
  filter(contrast == "6u,uninf - 6i,inf")
##            contrast  estimate       SE  df   z.ratio     p.value
## 1 6u,uninf - 6i,inf -3.685335 1.241082 Inf -2.969455 0.002983288

Clearly, this statement is true as well.

The authors’ analysis

After my analysis, I read the statistics section and results from the paper. The paper actually uses 4 different models:

  • A model with only 9u and 6i.
  • A model with only 4u/2i and 2u/4i.
  • A model with only uninfected individuals.
  • A model with only infected individuals.

I think it is preferable to fit a single model and then to generate the comparisons of interest from this one model. This way, one gets to use the full data set to estimate the model.

Overall though, the conclusions of the paper are upheld.

comments powered by Disqus