# Introduction

I’ll start of these blog posts (see here for a more general introduction) with an interesting paper that recently came in Proceedings B, by Nicolle Demandt, Benedikt Saus and others. Briefly, the study looks at how the behavior of stickleback fish changes when animals get infected with tape worms. Now we already know that some parasite infections can change behavior, for example by making animals less risk-averse. But in this paper the authors also look at how the behavior of non-infected fish changes when others in the group are infected.

The experiment involves infection some fish with tape worms, while other fish are kept healthy. The authors then let them swim together in groups of six fish, with different numbers of healthy or infected fish. This results in four treatments: six uninfected sticklebacks (6u), four uninfected and two infected sticklebacks (4u/2i), two uninfected and four infected sticklebacks (2u/4i) and six infected sticklebacks (6i).

They then expose these groups to a simulated bird attack, and measure their behavior before, during and after the attack.

For this post, I will just focus on the result presented in figure 2, which involves the escape from the simulated attack. The other analysis about time usage is basically very similar. This statement from the abstract is the topic here:

“… individuals in groups with only infected members showed lower escape responses and higher risk-taking than individuals from groups with only uninfected members. Importantly, uninfected individuals adjusted their risk-taking behaviour to the number of infected group members, taking more risk with an increasing number of infected group members.”

You can read the full open-access paper here.

# Short conclusion

I chose a different analysis strategy, using a single model instead of four. Regardless, the results are robust against this choice, and to me it seems that the conclusions hold.

Overall, the data was easily available (although in .xlsx format), and the figures and analyses were easy to reproduce!

I did find the figures to be not so great, and I would have made some adjustments.

# Analysis

### Packages and data

I’ll start by loading some packages for general convenience and mixed modelling.

```
library(tidyverse)
library(lmerTest)
library(emmeans)
```

The data is available on dryad, and with the follow code we can easily obtain the data, straight from their servers. It’s an excel file (I’d prefer to see .csv), so we’ll use the `readxl`

package to read it in.

```
url <- 'https://datadryad.org/bitstream/handle/10255/dryad.178683/Demandt_et_al_data.xlsx?sequence=1'
tmp <- httr::GET(url, httr::write_disk(tf <- tempfile(fileext = ".xlsx")))
d <- readxl::read_excel(tf)
unlink(tf)
```

This works fine, but the column names are long and inconsistent, and not everything is coded with the same levels as the figures. So I start of by renaming and relabeling the data.

```
d2 <- d %>%
transmute(
group = factor(GroupID),
fish = factor(FishID),
treatment = factor(TreatmentgroupMS, levels = c('6u/0i', '4u/2i', '2u/4i', '0u/6i'),
labels = c('6u', '4u/2i', '2u/4i', '6i')),
infected = factor(Infected, c('no', 'Yes'), c('uninf', 'inf')),
time_dangerous = `Time spend in dangerous zone (sec)`,
period = factor(`BeforeOrAfter bird strike`, c('Before', 'After'), c('before', 'after')),
escape_zone = factor(`Escape zone`, c('Safe zone', 'Dangerous zone'), c('safe', 'dangerous'))
)
```

Let’s have a quick look:

`d2`

```
## # A tibble: 336 x 7
## group fish treatment infected time_dangerous period escape_zone
## <fct> <fct> <fct> <fct> <dbl> <fct> <fct>
## 1 3 15 6u uninf 534 before dangerous
## 2 3 15 6u uninf 457 after dangerous
## 3 7 37 6u uninf 371 before dangerous
## 4 7 37 6u uninf 99 after dangerous
## 5 7 42 6u uninf 335 before dangerous
## 6 7 42 6u uninf 66 after dangerous
## 7 1 1 6u uninf 576 before safe
## 8 1 1 6u uninf 118 after safe
## 9 1 2 6u uninf 580 before safe
## 10 1 2 6u uninf 172 after safe
## # ... with 326 more rows
```

Overall, this looks good! One thing to look out for, is that the recorded escape zone (where each fish went during the simulated attack) is repeated twice, since there are before and after periods for each fish. I’ll create another data.frame that only has one observation per fish:

`d3 <- filter(d2, period == 'after')`

### Analysis: escape zone

Figure 2 in the paper looks at whether the fish escaped towards the safe lower part of the tank during the simulated attack, or whether they remained exposed by staying in the top section where they would be reachable by the bird predator.

We can quite easily recreate figure 2:

```
ggplot(d3, aes(infected, escape_zone, fill = infected, group = infected)) +
geom_violin(scale = 'count') +
geom_jitter(height = 0, width = 0.2, alpha = 0.3) +
facet_grid(~treatment) +
scale_y_discrete(expand = c(0.02, 0)) +
scale_fill_manual(values = c('white', 'grey50')) +
labs(x = 'infection status', y = 'escape zone', fill = NULL) +
theme(legend.position = 'bottom')
```

However, I think this figure is less than ideal. Violin plots don’t make a lot of sense here. They are meant to show continuous distributions, but here they are applied to binary data. It is also hard to see what the average response would be. I would probably rather use simple means and confidence intervals. One could calculate the binomial confidence interval (`binom.test`

gives them, for example), but bootstrapped confidence intervals will also do, and are really easy with `ggplot2`

.

I also think it is unnecessary to show the non-existing groups (i.e. there are no infected fish in the 6u treatment).

```
ggplot(d3, aes(infected, as.numeric(escape_zone) - 1, group = infected)) +
stat_summary(fun.data = 'mean_cl_boot', size = 1.5, fatten = 2.5) +
facet_grid(~treatment, scales = 'free_x', space = 'free_x') +
ylim(0, 1) +
labs(x = 'infection status', y = 'fraction with dangerous escape')
```

In this figure, for example, it is easier to see that the infected and non-infected individuals in the 2u/4i treatment respond very similarly.

Alternatively, we could do away with facets entirely. Although it does not matter much, I think.

```
ggplot(d3, aes(treatment, as.numeric(escape_zone) - 1, color = infected)) +
stat_summary(fun.data = 'mean_cl_boot', size = 1.5, fatten = 2.5,
position = position_dodge(width = 0.3)) +
scale_color_manual(values = c('black', 'grey50')) +
ylim(0, 1) +
labs(x = 'group composition', y = 'fraction that lacked escape', color = 'infection status')
```

So clearly, to analyze the escape behavior we will need a binomial model (there are two possible escape zones). And we will need to include a random effect for group id since multiple fish per group were observed, but they are not independent. So a generalized mixed model it is.

`fm1 <- glmer(escape_zone ~ treatment * infected + (1 | group), d3, 'binomial')`

`## fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients`

This model generates a message about dropping two coefficients. This makes sense, the effect of infection status cannot be estimated in the groups of only infected or non-infected fish.

We should first assess whether the data supports the inclusion of the interaction term. I do that by fitting the model without the interaction, and comparing AICs.

```
fm2 <- update(fm1, . ~ . -treatment:infected)
AIC(fm1, fm2)
```

```
## df AIC
## fm1 7 175.3231
## fm2 6 178.7194
```

The model with the interaction is about 3.5 points better, so we’ll keep it in. Indeed the F values for all terms are reasonably high:

`anova(fm1)`

```
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## treatment 3 13.4486 4.4829 4.4829
## infected 1 3.4544 3.4544 3.4544
## treatment:infected 1 5.0999 5.0999 5.0999
```

There are no p-values given here, but p-values are available for the individual coefficients, see `coef(summary(fm1))`

.

So the escape behavior depends on both the infection status and group composition Two immediate follow-up questions would be if infected fish decrease their risk-taking when grouped with healthy fish, and if healthy fish increase their risk when swimming with infected fish.

This can be answered by looking at some post-hoc comparisons. Specifically we want to look at the contrast between group composition, within each infection status. The `emmeans`

package is super useful for this.

`emmeans(fm1, pairwise ~ treatment | infected, adjust = 'holm')$contrast`

```
## infected = uninf:
## contrast estimate SE df z.ratio p.value
## 6u - 4u/2i -0.1797502 1.382841 Inf -0.130 0.8966
## 6u - 2u/4i -3.8899251 1.382848 Inf -2.813 0.0147
## 6u - 6i nonEst NA NA NA NA
## 4u/2i - 2u/4i -3.7101750 1.497699 Inf -2.477 0.0265
## 4u/2i - 6i nonEst NA NA NA NA
## 2u/4i - 6i nonEst NA NA NA NA
##
## infected = inf:
## contrast estimate SE df z.ratio p.value
## 6u - 4u/2i nonEst NA NA NA NA
## 6u - 2u/4i nonEst NA NA NA NA
## 6u - 6i nonEst NA NA NA NA
## 4u/2i - 2u/4i -0.6308713 1.181557 Inf -0.534 1.0000
## 4u/2i - 6i -0.2129252 1.132152 Inf -0.188 1.0000
## 2u/4i - 6i 0.4179462 1.020598 Inf 0.410 1.0000
##
## Results are given on the log odds ratio (not the response) scale.
## P value adjustment: holm method for 6 tests
```

Note that we get `NA`

values for the comparisons that are invalid (because of our rank-deficient model). That’s a bit of an issue here, since the standard Tukey method will control for a test too many. So I’m using the Holm method.

We can clearly see that the infected fish don’t change their behavior with group composition, but the infected fish have significantly changed their behavior when the majority of the group is infected.

I conclude that the statement *“uninfected individuals adjusted their risk-taking behaviour to the number of infected group members, taking more risk with an increasing number of infected group members”* seems to be true.

The authors also said: *“individuals in groups with only infected members showed lower escape responses and higher risk-taking than individuals from groups with only uninfected members.”* We can obtain this comparison from the same model as well:

```
emmeans(fm1, pairwise ~ treatment + infected, adjust = 'none')$contrast %>%
as.data.frame() %>%
filter(contrast == "6u,uninf - 6i,inf")
```

```
## contrast estimate SE df z.ratio p.value
## 1 6u,uninf - 6i,inf -3.685335 1.241082 Inf -2.969455 0.002983288
```

Clearly, this statement is true as well.