Say you're a scientist, and you're casually looking through the Guinness Book of World Records when you run across an entry that all of your experience and work in the lab tells you can't possibly be right. What do you do?
A) Ignore it, because of the source.
B) Dismiss it, because you're an expert and you know it has to be wrong.
C) Investigate it and find out what's going on as best you can determine.
In the case at hand, the scientist in question is biomechanics researcher Thomas Roberts of Brown University, and the subject is something that goes back to Mark Twain - The Celebrated Jumping Frog of Calaveras County. (* multiple versions with comments by Twain.)
More below the Orange Omnilepticon.
* If you've already been here once, check the update at the bottom of the rest of this diary. There's a lot more to this problem.
As reported in The Scientist, Robert's attention was caught by an entry in the Guinness Record Book dealing with how far a frog can jump.
While perusing the Guinness Book of World Records with his son a few years ago, biomechanics researcher Thomas Roberts of Brown University stumbled upon an entry regarding Rosie the Ribeter [sic], a competitor in the 1986 Calaveras County Jumping Frog Jubilee. That year, Rosie, a bullfrog (Lithobates catesbeiana), leapt 6.55 meters (21 feet, 5.75 inches) in a three-jump series, averaging 2.18 meters per jump. Not only did the impressive leaps constitute a world record, they obliterated the official upper limit of bullfrog jump length, or maximal performance, published in the scientific literature (usually around 1 m or less, with one report of 1.3 m), and the comparable data Roberts and his students were collecting in his lab.
What's a scientist to do when confronted by such a challenge to the accepted literature and his own findings?
For four days, Roberts and his team videotaped the jumping frogs, accumulating more than 20 hours of footage. Although the time-consuming process of extracting data from those recordings and analyzing the results still lay ahead, Astley says that it was clear pretty early on that these frogs were outjumping the frogs they kept in the lab. “Relatively soon in the competition [the fair officials] actually put down a tape mark at about 14 or 15 feet” to delineate a minimum distance frogs must jump to even have a chance of making the finals,” he says. “A 15-ft distance would be three 5-ft [more than 1.5-m] jumps—minimum. So that right there shows that it’s just a tremendous jump distance.”
Sure enough, their analysis revealed that 58 percent of the 3,124 jumps they quantified exceeded the 1.3-m maximum jump distance reported in the literature. The frogs at the fair jumped as far as 2.2 m in a single bound. In addition to casting doubt on previously published estimates of maximal bullfrog jump performance, the findings support the idea that the leg muscle itself is not sufficient to generate the power needed for such long leaps. Like other frog species, bullfrogs may have elastic tendons that work in a stretch-recoil fashion. “[Previously measured] jumps weren’t far enough to need a catapult mechanism,” Astley explains.
Why does this matter?
First, as The Scientist notes, the frog is a 'model' organism, meaning that understanding how its muscles function is used as a basis for understanding how muscles in other organisms (including humans) function. If that basic understanding is flawed in some way, it raises questions about all the other work that follows from it.
Second, the differences between the way the frogs performed in the lab versus the way they performed at the Frog Jubilee would seem to indicate the work being done in the lab was failing to accurately measure the capabilities of the frogs. (As the article observes, it's not just that frogs could jump farther, the frog handlers were better at getting them to perform to their full capabilities than the scientists!) What else in the way the lab carries out experiments may be skewing results?
Third, it shows the value of being able to collect data in significant quantities when circumstances allow. The competition at the Jubilee allowed Brown and his team to capture data from many more jumps than a lab could reasonably expect to carry out. And it was not just the quantity of the data, but the quality of it.
Fourth, this is a nice little example of the way science is supposed to work: a phenomenon comes to be understood by collecting data and developing hypotheses that attempt to explain it. When, as happens here, new data becomes available that challenges that understanding, the proper response is to examine the data, verify it, and reevaluate the hypotheses in light of that new data.
Despite everything, that's still a very serious problem for science. How well do experiments actually capture what's being investigated? How good is the data being collected - and how much is an adequate sample? How well do the hypotheses being developed actually describe the subject of a research project? The Scientist has another article that looks at the growing realization that a fair amount of research being published can't be replicated.
Lately the scientific community has been talking about the omnipresent problem of irreproducibility—the failure of researchers to replicate results in the published literature. This week in Nature, National Institutes of Health (NIH) Director Francis Collins and Principal Deputy Director Lawrence Tabak weighed in on this discussion, noting several ways in which the agency hopes to fix the problem.
“Reproducibility is potentially a problem in all scientific disciplines,” the duo wrote. “[T]he checks and balances that once ensured scientific fidelity have been hobbled.”
The consensus is that it's not a matter of deliberate fraud (except rarely), but more a question of pressures to present work in a way that maximizes impact, gets published in journals with a certain amount of status, and errors in experimental design and methodology due to inexperience or lack of proper training. There's also the problem of data that never gets reported because it seems to lack any great import, and details of the work that may be glossed over or omitted entirely because it seems trivial.
The Scientist has covered just how frustrating this can be for researchers attempting to duplicate and extend work, as this editorial shows:
Every practicing scientist knows how difficult it can be to make an experiment work in the lab. Especially frustrating is not being able to reproduce already published experiments. You read a paper, closely follow its materials and methods, buy all of the reagents, run across your department to secure all of the equipment, begin your experiment, and . . . it doesn’t work. At some point, after trying time and time again, stuck alone at midnight in the lab and pulling your hair out, you raise your eyes to the skies and ask: Why doesn’t it work? The paper was published in a prestigious scientific journal by a famous professor. I’ve done everything as written and still it doesn’t work. Why?
The opinion piece quoted above goes on to detail some of the ways scientists are attempting to deal with these problems, one of which includes
publishing videos of research showing all those details that fail to be captured by words on a page, graphs, numbers, photos, etc. Science that can't be successfully communicated in as much detail as necessary for others to grasp it, reproduce it, and build on it is not science as good as it must be, especially in these days of limited resources.
And there's always new tools, new methods, waiting to be picked up and put to use. Science isn't just for scientists, after all.
UPDATE: In one of those bits of synchronicity that happen from time to time, Kevin Drum pointed today to a paper dealing with the problem of publication bias, where the results that get published are the ones that are 'successful' or just different enough to stand out…. Which means the actual facts of the matter may show something quite different. Drum gives a good explanation and an example that makes it clear.
Here's an example. Suppose several teams coincidentally decide to study the effect of carrots on baldness. Most of the teams find no effect and give up. But by chance, one team happens to find an effect. These statistical outliers happen occasionally, after all. So they publish. And since that's the only study anyone ever sees, suddenly there's a flurry of interest in using carrots to treat baldness.
The paper Drum is referencing
is here. In essence, it's using statistics to get an estimate of the 'missing results' that didn't make the publication cut. And then Drum goes further by linking to
this article at Outside the Beltway that really dives into the problem of published research that is useless or worse than useless, and just how bad the problem is.
A recent piece in The Economist points to an issue that has been a recurring theme at OTB over the years: there’s reason to be deeply skeptical of most research published in scientific and other scholarly journals. While I’ve long noted the dubious methodology of medical research in particular, and of the vaunted peer review process in general, it’s actually much worse than I’d previously understood.
(NOTE: The OTB essay references
the same paper UntimelyRipped referenced in comments.)
The Economist article that got James Joyner at Outside The Beltway to write up his own experiences lays out just how serious the problem of irreproducibility is, and what it means.
...But irreproducibility is much more widespread. A few years ago scientists at Amgen, an American drug company, tried to replicate 53 studies that they considered landmarks in the basic science of cancer, often co-operating closely with the original researchers to ensure that their experimental technique matched the one used first time round. According to a piece they wrote last year in Nature, a leading scientific journal, they were able to reproduce the original results in just six. Months earlier Florian Prinz and his colleagues at Bayer HealthCare, a German pharmaceutical giant, reported in Nature Reviews Drug Discovery, a sister journal, that they had successfully reproduced the published results in just a quarter of 67 seminal studies.
The governments of the OECD, a club of mostly rich countries, spent $59 billion on biomedical research in 2012, nearly double the figure in 2000. One of the justifications for this is that basic-science results provided by governments form the basis for private drug-development work. If companies cannot rely on academic research, that reasoning breaks down. When an official at America’s National Institutes of Health (NIH) reckons, despairingly, that researchers would find it hard to reproduce at least three-quarters of all published biomedical findings, the public part of the process seems to have failed.
There is no doubt that one of the fundamental principles of science is in trouble here. If people can't do research and communicate the results in a way that others can replicate, there are questions that need to be asked. 1) Was the original work valid? 2) Was the process of documenting it flawed so that critical elements were omitted due to bad experiment design, experimenter oversight, or problems with the publication process? 3) Are those trying to replicate the work making errors of their own for whatever reason?
For something based on fact and reason, there's more than a little irony that those doing science have as an article of faith that science is self-correcting. And in fact it is, if this growing debate is an indication. Enough evidence is accumulating to show that there IS a problem - and it needs to be addressed. Joyner at OTB gets into some depth as to what he thinks is a big part of the problem. He notes that we now have software that makes doing statistical analysis easy - but it's dangerous in the hands of scientists who don't really understand how it works or all of the assumptions built into it.
Part of the problem here is that we’ve reversed the order of operations here, totally undermining a core theory of research design, namely that analysis should follow theory rather than the reverse. Even two decades ago, it was simply gospel that good scientific research started with a hypothesis grounded in theory and that modeling and statistically analysis followed. But, increasingly, researchers are mining the data for interesting results and then crafting a theory to explain the outcomes. The reason for this has already been alluded to: academics need to publish and one’s findings need to be interesting to get published. But the value of the older way of doing things was that it led to extreme skepticism about interesting but totally counterintuitive results.
Joyner also notes two other serious problems: the peer review process is terribly flawed, and there's just not enough interest or reward in replicating work to validate it beyond question. The recognition of this is the first step in correcting it.
Which, brings us back full circle to Robert's work with frogs. When he saw data that conflicted with what he knew, he went to the trouble of examining it and correcting his work on the basis of what he found. And that's what Doing Science is all about - or should be.