What do the numbers really mean? Interpreting variety trial results

Andrew Kness, Agriculture Extension Agent | University of Maryland Extension, Harford County
Dr. Nicole Fiorellino, Extension Agronomist | University of Maryland, College Park

Each year, the University of Maryland, and other land-grant universities across the US, conduct agricultural variety trials that provide farmers and other professionals in the industry with valuable data on crop performance. These data provide critical information regarding varietal differences, such as yield, plant characteristics, disease resistance, and geographic performance, which aid producers in making the best decisions on variety selection for their farms.

Reports from University variety trials are generated yearly, and can be quite lengthy and may contain values, metrics, and other information that require explanation. If you are going to utilize variety trial data to make on-farm decisions, it is important to understand how to read and interpret the data so that you are able to draw the correct conclusions. For example, it is easy to simply search the tables for the top-yielding variety and dismiss the rest of the information. This article will explain how and why variety trials are set-up the way they are, walk you through what the data mean, and how to interpret the statistics and make sound conclusions based on those statistics.

The primary objective of a variety trial field study is to test the performance of crop hybrids relative to each other and relative to check varieties embedded in the study. To do this, the trials are designed in such a way as to eliminate as much variability as possible to strengthen our ability to detect a difference in hybrid performance. As with anything in agriculture, there is a lot of variability associated with conducting research in the field. Variations in weather, soil types, and pest pressure are just a few of the factors that introduce variability in our research. In order to help control for this variability, variety trials are designed as small plots (often 10 feet wide x 30 feet long) and placed in a field with consistent soil types – again, to minimize variability. All the plots are treated exactly the same in respect to planting date, planting depth, harvest date, data collection, pest management, and fertility; the only variable we allow to be different is variety. In addition, each variety is replicated multiple times within one field, usually 3-5 times at random locations within one field. This randomized replicated plot design helps to minimize the effects of the spatial variability that we do not have control over (such as weather, soil type, and pest pressure). Figure 1 depicts a randomized plot design that contains four varieties replicated four times.

Figure 1. Example field plot design with four varieties replicated four times in a randomized block plot design.
Figure 1. Example field plot design with four varieties replicated four times in a randomized block plot design.

The data that are collected from these plots are then used to compare each variety to the others in the trial using statistical methods, which is typically an analysis of variance (ANOVA). An ANOVA test compares each treatment (or variety in this example) with each other, taking into account the variation in the data. Figure 2 on the next page is a table from the 2019 University of Maryland Corn Variety Trials for mid-season maturity corn hybrids at Keedysville, MD. There is one number listed for yield for each variety, but this number is actually an average of the yield for the three plots, or replicates, of each variety that were planted and harvested. For example, yield for LG62C02VTRIB, reported as 223 bushels per acre, is the average of 226, 231, and 212 bushels per acre, collected for each of the three plots planted at this location. Since the yields were not identical for each of the plots, there is variation about the average yield. The ANOVA test compares the variation in average yield for each variety to determine if the numerical difference in average yield is due to differences in variety performance or due to random chance. The ANOVA test takes into account a confidence interval, which we define prior to the study. In the scientific field of agricultural research, it is generally acceptable to define your confidence interval between 90-95%. This means that we are 90-95% confident that the differences observed between varieties is due to the variety performance and not some other factor (such as weather, soil type, etc.). This confidence means it is likely this difference in variety performance would likely be observed if the comparison was repeated under similar conditions.

With the basic concept in mind, return back to Figure 2. One might think that hybrid DKC61-41RIB out-yielded NK 1205-3120 by 6.4 bushels per acre. It is true that it did; however, we plan to utilize this data to make predictions going forward; in other words, will DKC61-41RIB consistently out-yield NK 1205-3120, or is the 6.4 bushel difference we observed in yield due to random effects? This is where we need to use statistics to answer these questions.

The bottom four rows of the table in Figure 2 are where you will find the statistics to make inferences about the trial dataset. Trial mean is simply the average of all varieties in the trial, which is an indicator of how the trial performed as a whole and is used to calculate the relative yield. The next two rows, Probability > F and LSD0.1, are generated from an ANOVA test and are critical to interpreting the data correctly.

Probability > F (indicated as P > F in other reports) indicates the likelihood that what we observe in variation between varieties is due to random effects and not some other variable (in this case, variety). This value can be between 0 and 1. If this value is large, then it means that the differences we observe are due to random effect and not hybrid performance; therefore there are no yield differences between varieties. However, if the value is small, then there are differences between varieties that are not explained by random variation.

In this example for yield, Probability of > F is 0.0805, or 8.05%. As mentioned previously, for field research, confidence intervals are often set at 90-95%; which equates to a probability level of between 0.1 and 0.05 (defined as an “alpha level” in statistics). In this trial, the alpha level was defined as 0.1, as indicated by the subscript 0.1 after LSD. If the Probability > F is less than 0.1, we can conclude with at least 90% confidence that there is a difference in yield due to variety. If the value of P > F is greater than 0.1, then we conclude there were no yield differences between varieties. In this example, there are significant differences in yield, moisture, and test weight due to variety. We cannot conclude there is a difference in lodging or plant population as a result of variety. This means that statistically there is no difference in DKC59-82RIB with a lodging score of 1.4%, and P1197 AM, with a lodging score of 0%.

The next row in the table, LSD0.1, tells us the “least significant difference,” or the threshold that must be overcome to conclude that the performance of two varieties are significantly different. If the ANOVA test returns a P value that is greater than the defined alpha level (0.1 for our example), then there will be no significant differences between treatments, and LSD is denoted NS (not significant). If the test returns a P value less than the alpha level, then the LSD value will tell us what is considered a significant difference between treatments. For yield in the example above, there needs to be a difference of 12.4 bushels before we can say with 90% confidence that the difference in yield between any two hybrids is due to the variety and not random chance. The top yielding variety in this trial was DKC59-82RIB (highlighted). This variety yielded significantly more than all other varieties, except for SCS 1105AM. You will notice that the difference in yield between these two hybrids (8.4 bushels) does not exceed the cutoff defined by the LSD; therefore, they are not significantly different than each other. If there is a difference of at least 12.4 bushels between any two varieties in the trial, then we can conclude that there is a difference in yield that was caused by variety. In the example, the lowest yielding variety (LCX10-98 VIP3110) did not yield significantly less than any other variety except for the top two (DKC59-82RIB and SCS 1105AM).

The final statistic is the coefficient of variation (CV%). This is a measure of the variation in the data; the smaller the number, the less variability. Values for CV under 10% for yield tell us there was not too much variability in yield and that we are able to distinguish variety differences. The more variation in the dataset will require a larger LSD to separate differences between treatments.

Variety trials presented with statistical analyses provides a way for us to compare varieties as best we can in a real-world setting through replicated plots. When using variety trial data, it is best to choose varieties with yield stability and desirable characteristics across multiple locations and across multiple years, whenever possible. You will also find very similar statistical methods in not only variety trial reports, but for any type of replicated field research. These statistical analyses provide you with assurance that the conclusions drawn are due to treatments research and you could expect similar results if the comparison was repeated under similar conditions. If you encounter data or reports that do not have any type of statistical analysis presented, it is important to realize that you should not draw any conclusions from that dataset.