Published on Author Mike

This isn’t about China, but it is a particularly clear example of results that rendered nonsense by the failure to distinguish correlation from causation. In this case, it’s a likely outside factor that affects both the “independent” variables (the right-hand side) and the “dependent” variable (the left-hand side of y = a + ßX, where X is a list of variables … in math jargon, this is equation has vectors as variables).

Jayson Lusk, a food and agriculture economist who blogs at http://jaysonlusk.com/
October 24, 2018

A couple of days ago, JAMA Internal Medicine published a paper looking at the relationship between stated levels of organic food consumption and cancer among a sample of 68,946 French consumers.

The paper, and the media coverage of it, is frustrating on many fronts, and it is symptomatic of what is wrong with so many nutritional and epidemiological studies relying on observational, self reported data without a clear strategy for identifying causal effects. As I wrote a couple years ago:

“Fortunately economics (at least applied microeconomics) has undergone a bit of credibility revolution. If you attend a research seminar in virtually any economics department these days, you’re almost certain to hear questions like, “what is your identification strategy?” or “how did you deal with endogeneity or selection?” In short, the question is: how do we know the effects you’re reporting are causal effects and not just correlations.

Its high time for a credibility revolution in nutrition and epidemiology.”

Yes, Yes, the title of the paper says “association” not “causation.” But, of course, that didn’t prevent the authors – in the abstract – from concluding, “promoting organic food consumption in the general population could be a promising preventive strategy against cancer” or CNN from running a headline that says, “You can cut your cancer risk by eating organic.”

So, first, how might this be only correlation and not causation? People who consume organic foods are likely to differ from people who do not in all sorts of ways that might also affect health outcomes. As the authors clearly show in their own study, people who say they eat a lot of organic food are higher income, are better educated, are less likely to smoke and drink, eat much less meat, and have overall healthier diets than people who say they never eat organic. The authors try to “control” for these factors in a statistical analysis, but there are two problems with this. First, the devil is in the details and the way these confounding factors are measured and interact could have significant effects. More importantly, some of these missing “controls” are things like overall health consciousness, risk aversion, social conformity, and more. This leads to a second more fundamental problem. These unobserved factors are likely to be highly correlated with both organic food consumption and cancer risk, and thus the estimated effect on organic is likely biased. There are many examples of this sort of endogeneity bias, and failure to think carefully about how to handle it can lead to effects that are under- or over-estimated and can even reverse the sign of the effect.

To illustrate, suppose an unmeasured variable like health consciousness is driving both organic purchases and cancer risk. A highly health conscious person is going to undertake all sorts of activities that might lower cancer risks – seeing the doctor regularly, taking vitamins, being careful about their diet, reading new dietary studies, exercising in certain ways, etc. And, such a person might also eat more organic food, thus the correlation. The point is that even if such a highly health conscious person weren’t eating organic, they’d still have lower cancer risk. It isn’t the organic causing the lower cancer risk. Or stated differently, if we took a highly health UNconscious person and forced them to eat a lot of organic, would we expect their cancer risk to fall? If not, this is correlation and not causation.

Ideally, we’d like to conduct a randomized controlled trial (RCT) (randomly feed one group a lot of organic and another group none and compare outcomes), but these types of studies can be very expensive and time consuming. Fortunately, economists and others have come up with creative ways to try to address the unobserved variable and endogeneity issues that gets us closer to the RCT ideal, but I see no effort on the part of these authors to take these issues seriously in their analysis.

Then, there are all sorts of worrying details in the study itself. Organic food consumption is a self-reported variable measured in a very ad-hoc way. People were asked if they consumed organic most of the time (people were given 2 points), occasionally (people were given one point), or never (no points), and this was summed across 16 different food categories ranging from fruits to meats to vegetable oils. Curiously, when the authors limit their organic food variable to only plant-based sources (presumable because this is where pesticide risks are most acute), the effects for most cancers diminishes. It is also curious that the there wasn’t always a “dose response” relationship between organic consumption scores and cancer risk. Also, when the authors limit their analysis to particular sub-groups (like men), the relationship between organic consumption and cancer disappears. Tamar Haspel, a food and agricultural writer for the Washington Post, delves into some of these issues and more in a Tweet-storm.

Finally, even if the estimated effects are “true”, how big and consequential are they? The authors studied 68,946 people, 1,340 of whom were diagnosed with cancer at some point during the approximately 6 year study. So, the baseline chance of any getting any type of cancer was (1340/68,946)*100 = 1.9%, or roughly 2 people out of 100. Now, let’s look at the case where the effects seem to be the largest and most consistent across the various specifications, non-Hodgkin lymphomas (NHL). There were 47 cases of NHL, meaning there was a (47/68,946)*100 = 0.068% overall chance of getting NHL in this population over this time period. 15 and 14 people, respectively, in the lowest first and second quartiles of organic food scores had NHL, but 16 people in the third highest quartile of organic food consumption had HCL. When we get to the highest quartile of stated organic food scale, the number of people with HCL now dropped to only 2. After making various statistical adjustments, the authors calculate a “hazard ratio” of 0.14 for people in the lowest vs. highest quartiles of organic food consumption, meaning there was a whopping 86% reduction in risk. But, what does that mean relative to the baseline? It means going from a risk of 0.068% to a risk of 0.068*0.14=0.01%, or from about 7 in 10,000 to 1 in 10,000. To put these figures in perspective, the overall likelihood of someone in the population dying from a car accident next year are about 1.25 in 10,000 and are about 97 in 10,000 over the course of a lifetime. The one-year and lifetime risk from dying from a fall on stairs and steps is 0.07 in 10,000 and 5.7 in 10,000.

In sum, I’m not arguing that eating more organic food might not be causally related to reduced cancer risk, especially given the plausible causal mechanisms. Rather, I’m arguing that this particular study doesn’t go very far in helping us answer that fundamental question. And, if we do ultimately arrive at better estimates from studies that take causal identification seriously that reverse these findings, we will have undermined consumer trust by promoting these types of studies (just ask people whether they think eggs, coffee, chocolate, or blueberry increase or reduce the odds of cancer or heart disease).

4 Responses to Endogeneity

  1. I like the insight on discovering biases on these studies. As the blog post points out, the research did not determine causation between organic food and cancer risk, but when it comes to popular topics like organic food, the bias is often ignored and the “causation” is amplified through media. We see a lot of these biases on many topics these days, and these biases have a direct impact not only on academic research, but also on people’s behaviors.

    Another bias that we are probably all guilty of at some point is confirmation bias. This happens when we have an existing opinion, and we only look for evidence that support such opinion and ignore the rest. I would argue we are more vulnerable than ever to have confirmation bias these days, due to the highly customized content that gets delivered to us by browser cookies. I believe that reducing biases have a positive impact on the way we make decisions.

  2. The article has a great point. That we should question of how we come to a conclusion when we see a relationship. Things aren’t so simple that we could just simply take what we see in studies face value. We live in a complex world we where outcomes are often influenced by many small variables. If a conclusion were to be made, we need to make sure that the idea is sound and stands on firm ground. One famous example I know is the study that shows the correlation between ice cream sales and crime. Although the study made it appear that the two were correlated, they were in fact both affected by an invisible third variable which was the true cause: temperature. Just because two variables have a relationship between them, it is possible that the correlation is meaningless. Once we consider all the possible factors can we make a point.

  3. This article harkens back to the age-old economics creed of correlation does not prove causation. This article also does a great job of thoroughly discussing the potential problems that can be caused from observational studies and the difficulties arising from self-reporting. Heterogeneity is a problem for large data sets such as this. Another example I can think of are early studies on the relationship between financial market development and overall GDP growth. In my forecasting class we reviewed how these studies did not mathematically try to find causation, and instead only observed correlation and used that for their result, whereas they should have used causality testing, such a Granger equation, to properly account for this.

  4. Endogeneity is arguably one of the greatest challenges to account for in research. This reminds me of studies that had linked family dinners with decreased risk for drug addiction amongst teens. While the findings are initiative and make sense, these studies were later criticized for their failure to control for the related underlying factors that could be driving the decreased risk for teen drug addiction, including strong family connections or deep parental involvement in a child’s life.