The article

This report describes all steps for reproducing the analysis of data from multiple field trial using meta-analytic approaches in a recent article published in Plant Pathology. The work was conducted in the Epidemiology Lab (Del Ponte Lab) in collaboration with the EVADE- Epidemiology of vegetable diseases Lab (Pethybridge Lab):

Lehner, M. S., Pethybridge, S. J., Meyer, M. C., & Del Ponte, E. M. (2016). Meta-analytic modelling of the incidence-yield and incidence-sclerotial production relationships in soybean white mould epidemics. Plant Pathol. doi:10.1111/ppa.12590

In the article (pre-print here), two relationships were studied: soybean white mold incidence (inc, %) and soybean yield (yld, kg/ha) and incidence and sclerotia weight (scl, g/ha). The data were obtained from a scientific report (PT language) on fungicide efficacy evaluated in 35 trials conducted across several locations and 4-year period in Brazil. The data was organized in tables reporting the mean values of the three variables of interest, one table for each trial.

The two relationships were summarized using meta-analytic models using three effect-sizes: 1) Fisher’s z (from transforming Pearson’s r); 2) intercept and 3) slopes of a random-coefficients models fitted to the data. We followed the procedures detailed in previous studies in plant pathology (Madden and Paul, 2009; Dalla Lana et al., 2015). For summarizing the correlation coefficients, we calculated Fisher’s z and fitted random-effects and mixed-effects model; the latter for testing the effect of moderator variables. For summarizing the slopes and intercepts, we fitted multi-level models (random-coefficient) as described in Madden and Paul (2009) who used SAS. Here, we demonstrate the analysis in R using the lmer package as described in this metafor tutorial. Similar to the tutorial, the estimates were similar between the two-stage and the multilevel approach, and so we decided to use the latter.

This report

My goal is to demonstrate, using R programming, each step of the analyses, from data preparation to presentation of results, which can be reproduced either by myself or other people interested in the same topic. Instead of giving away all the data and my (messy!) original code, I thought that investing time to prepare this report can contribute to popularize the use of meta-analytic approaches in plant pathology. The data, codes, pre-print version of the article and supplementary materials were also made available at this GitHub repository.

The report is the html output of an R Markdown file prepared with the R Studio IDE for R. As much as possible I used the pipe operator %>% of the magrittr package, which really makes the code easier to write and understand. The plots were prepared using both the base R graphics and ggplot2, whichever was more convenient. Most of them are simple versions for a quick visualization, not actually formatted for final publishing.

Data import

First, we need to load the packages.

library(tidyverse); library(broom); library(tidyr);
library(cowplot); library(tibble); library(knitr); library(nlme); library(lme4)

The raw data was organized in the long or tidy format where each treatment (observation) in a fungicide trial (hereafter study) is placed in its row and each variable in its column. Let’s now load the data and group by study for separated data frames, one for each relationship.

# inc-yld
dat_yld <- read_csv("dat-white-mold-br.csv") %>%   
 group_by(study)
# inc-scl
dat_scl <- read_csv("dat-white-mold-br.csv") %>%
na.omit(dat_scl)%>% # some trials did not have sclerotia data, and we omit them here
 group_by(study)

See the structure of the data set and all variables types (scroll the content to the left). The full data set in csv format can be downloaded here: dat-white-mold-br.csv

dat_yld
## Source: local data frame [382 x 17]
## Groups: study [35]
## 
##    study treat    season harvest_year  location state country elevation   region elevation_class inc_check inc_class yld_check yld_class   inc   scl   yld
##    <int> <int>     <chr>        <int>     <chr> <chr>   <chr>     <int>    <chr>           <chr>     <dbl>     <chr>     <dbl>     <chr> <dbl> <int> <dbl>
## 1      9     1 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high  37.7  5092  3729
## 2      9     2 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high  11.6  6154  3739
## 3      9     3 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high  33.5   200  3863
## 4      9     4 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high   1.0   180  3904
## 5      9     5 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high   5.6  1123  4471
## 6      9     6 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high   1.0   641  4313
## 7      9     7 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high   3.7  1203  4177
## 8      9     8 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high   0.0   521  4001
## 9      9     9 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high   1.1    20  4090
## 10     9    10 2009/2010         2010 Agua Fria    GO  Brazil       891 Northern             low      37.7       low      3729      high   0.0     0  4254
## # ... with 372 more rows

Data visualization

Raw data

The histograms below summarize the distribution of the three variables used to calculate the three effect-sizes for the meta-analysis. We can see that the two disease-related variables (inc and scl) are skewed to the left and yield data are more normally distributed.

par(mfrow= c(1,3))
hist(dat_yld$inc, main = "Incidence")
hist(dat_yld$yld, main =  "Yield")
hist(dat_yld$scl, main = "Sclerotia weight")