What's in a Number?

Numbers and measures in the absence of context can lead to incorrect conclusions.

By Stephanie Rutten-Ramos

Numbers give us confidence. The ability to take a measurement provides a sense of security - a sense that we know how things really are - a sense that we know the 'truth.' To paraphrase W.E. Demming, if we measure something, we should expect improvement. It seems that the energy spent in the process of measurement can translate into improvement of the process as a whole. That being said, be careful what you measure!

In addition to using numbers to measure a process, we also use numbers to compare products, methods and materials. Whenever we try to do such a comparison, we can expect to reach one of the following conclusions: the true conclusion (there is a real difference, or there is not a real difference), or the false conclusion (there is a difference when there really isn't, or there is no difference when there really is).

Our ability to use numbers to count, average and otherwise describe situations, however, can give us a false sense of security about our ability to reach correct conclusions. Our inclination to use selective measuring to justify 'gut-level hunches' is likely to do more damage than good.

Consider the following case.

A newly-expanded sow herd with a "green" herd manager was struggling to produce quality weaned pigs. The genetic supplier provided the owners with a list of areas to improve upon. The consulting veterinarian decided the first priority would be to reduce the current rate of stillborns by half a pig.

Up to this point, the herd had been inducing sows to farrow on day 116 of gestation, and loading them on the same day. After reviewing the induction protocol, the consulting veterinarian decided to run a trial to see if inducing the sows at day 115 would reduce the stillborn rate. Therefore, he instructed the farm manager to use the new induction protocol for two months, after which they would look at stillborn rates to see if the new protocol generated enough improvement. The manager was also instructed to load sows earlier (by day 113) to avoid having sows farrow in gestation.

In all reality, this "trial" is nothing more than a "try-it-and-see" approach. Its outcome will be selectively used to justify a hunch to a skeptical audience. However, a bit of design discipline up front is all this herd would have needed to conduct a valid trial that would yield a reliable outcome. Not only would a valid design be more apt to obtain the true result (i.e., there is an effect when there really is or there is no effect when there really is not), it would also improve the farm's confidence in the outcome and may even decrease the time needed to reach a conclusion.

So what questions should be asked?

  • What is the normal variability in stillborn rate?

  • Are there substantial differences in stillborn rates across parities, such that a possible "parity effect" needs to be incorporated into the trial design?

  • What is the appropriate outcome variable: stillborn rate or percent of total born weaned (i.e., if decreasing stillborns results in more low viability piglets, is there any gain)?

  • From an economic perspective, what difference in stillborn rate would warrant a change in protocol?

Answers to these questions would allow the veterinarian and herd manager to make a reasonable decision about the number of sows over which the two protocols (induction on day 115 versus induction on day 116) will need to be compared.

Here the veterinarian intended to use historical records as the "control," or basis for protocol comparison. This is an area where many on-farm trials go awry; fair trials are not as simple as comparing data from two different time periods. Many variables have a seasonal component. And too often, multiple changes are instituted, while only a single change is considered in the analysis. When more than one change is made, it is unreasonable to conclude that any observed difference could be attributed to a single item.

In this case, two management changes were being made -loading day and induction day - while only one change was to be considered as having an effect on stillborn occurrence. In order to draw a reasonable conclusion about the effectiveness of the induction protocol, this induction protocol needs to be compared across sows loaded by the same day of gestation.

Consider Multiple Factors

The number of sows over which the trial should be performed depends on a number of factors pertaining to how certain the herd wants to be about the reliability of the outcome, how big a difference in stillborn rate they want to be able to detect, and how much variation is normally observed in outcome measure.

Although the number of stillborn pigs per litter is not a normally distributed variable, the veterinarian or producer could use a student's T-test to compare the means between the treatment and control groups. Table 1 lists the number of sows to be included in each trial group given the standard deviation of stillborn pigs per litter and desired difference observed between treatments, when the probability of detecting a difference when there is none is 5% (alpha=0.05) and the probability of missing a real difference is 20% (power=0.8).

Standard Deviation

Another common abuse of numbers occurs with the simplistic retrospective 'analysis' (for example, what change caused production to decrease). Just as the previous scenario was prone to reach an errant conclusion, so too is this one. Correlation does not equal causation, and populations change over time. For these reasons, the conclusions of simple retrospective studies (looking into the past) are not as reliable as those from prospective studies (following into the future). Well-designed retrospective analyses, however, can account for many factors, including changes to populations. Yet, the detailed analysis is rarely employed in the pig barn.

Here's an example.

An experienced herd manager grew frustrated watching litter size diminish over time. In an effort to determine the cause, he looked back through his calendar records along with a performance monitor and concluded that the drop off in litter size was attributable to the use of a newly formulated vaccine. (See Figure 1) In reality, at the same time the vaccine manufacturer was reformulating the vaccine, this herd was initiating a gradual rollover into a new genetic program. The cause of the declining litter size was related to the changes in herd population composition and age structure (Figures 2a and 2b).

Figure 1: Average pigs born alive/litter

Figure 2a: Proportion of farrow events by genetic base

Figure 2b: Average Faroow parity by genetic base

This practice is far too common in pig production. Both positive and negative change in production caused by changes in population compositions are attributed to people, practices or products (materials). In the absence of a detailed analysis (i.e., consideration of all potentially influential factors), errant cause-and-effect conclusions are reached and the root causes of problems are not identified and addressed.

Due Diligence

The value of numbers and their use to make decisions is unquestionable, but numbers have to be used appropriately. Part-prospective, part-retrospective 'trials' lend themselves to errant conclusions, and simple retrospective analyses can be just plain dangerous. All too often, they are employed to justify gut-level approaches to skeptical audiences. And how many times are new protocols instituted under the guise of a trial and never removed? With the lack of disciplined design, these analyses stand to miss real effects that actually exist.

A little due diligence is all that is needed for herds and operations to be able to make management decisions on the basis of statistical evidence. After all, we're devoting resources to data capture, so why not make the most of it?

Editor’s Note: Stephanie Rutten-Ramos received her DVM and PhD from the University of Minnesota and is an independent consultant. To contact her, e-mail: rutt0011@umn.edu