Planning and Executing Credible Experiments. Robert J. Moffat
Чтение книги онлайн.
Читать онлайн книгу Planning and Executing Credible Experiments - Robert J. Moffat страница 19
Finally, a fun yet serious invitation. Physicists dispute a “theory of measurement” concerning mysterious quantum behavior. In certain experiments, if a measurement is made the experiment behaves one way; if no measurement is made the experimental results are opposite. The contrary results are repeatable. Include in the dispute: Is a human required as observer? Can a cat be the observer? If so, would Schrödinger's cat endanger itself? An internet search of this topic results in millions of hits, so we recommend Richard Muller's Now: The Physics of Time (2016).
Since the idea of this book is to equip you, dear Reader, to plan and execute taking measurements in your experiment, this dispute is moot.
Panel 2.3 Prepublishing Your Experiment Plan
We advise publishing and presenting your experiment plan to your client before collecting data. In the same vein, Button et al. in Nature Reviews Neuroscience (2013) made the following recommendations for researchers:
Perform an a priori power calculation. [Our note: This helps estimate sample size.]
Disclose methods and findings transparently.
Preregister your study protocol and analysis plan.
Make study materials and data available.
Work collaboratively to increase power and replicate findings.
These recommendations combined with our strategies and guidelines help to achieve a credible experiment.
2.4.7 Surveys and Polls
If your type of experiment is a survey or poll, this text is also for you. The same strategies for sampling, statistics, and modeling apply. For a poll to promise any sort of accuracy, the sampling must be random and guided by good statistical practice. The principal challenges are designing nonleading questions and obtaining a representative sample. Otherwise a poll will mislead. Who has not witnessed and pained over a misleading poll?
Surveys of selective scientific audiences have a mixed history. One survey is particularly notable to us as experimentalists. In 2006, a survey of the International Astronomical Union (IAU) in Prague became memorable when the IAU demoted the planet Pluto. Pluto had been discovered and recognized as a planet in 1930. For decades teachers and textbooks across the world taught the nine planets in the solar system. Then the press in 2006 widely reported that the IAU declared that Pluto was no longer a planet. From our viewpoint, as IAU outsiders, it was a decree with huge impact on classrooms and the textbook industry.
As experimentalists, we are interested in the sampling, statistics, accuracy, and credibility of the Pluto survey. These all have relevance to us as we consider “The Nature of Experimental Work” (this chapter). We read that within the IAU there is dispute and controversy. How well did the vote represent the entire population?
Let's review details: The total population of IAU members is about 11,000. In 2012, the IAU reported 10,894 individual members from 93 countries worldwide. At the 2006 Prague conference, 2,400 members had registered. On the last day of the conference, just 424 astronomers remained and were polled on several issues. On some issues, exact numbers of the vote were reported (IAU 2006). However, the definitions that resulted in Pluto's demotion was reportedly decided by voice vote, approved by a majority.
How representative of the IAU population was the poll? For a sample size of 424 randomized members out of 11,000, a typical confidence interval could be the claimed 5%. The confidence interval surrounds an estimate, however a voice vote denies one. With a savvy, engaged population of size 11,000, why not take a vote of the full IAU population? Why rely on a poll to redefine international standards?
The IAU clearly has the right to set its own standards and definitions. Via the 2006 Pluto poll however, the IAU leaders decided for discriminating communication within its own community. How well did the IAU poll represent all science teachers when it demoted Pluto? By its decision, one consequence was to make existing text books obsolete, affecting the budgets of school systems worldwide.
Did the IAU leadership poorly serve its members, other scientists, and students across the world by relying on a poll instead of a full vote? From history, can we find examples where polls with small margins of error failed to predict the vote?
Another survey of note, recently conducted by the journal Nature, hearkened back to the Reproducibility Crisis announced by Ioannidis. Nature published (2016) a survey of researchers across the living and hard sciences asking “Is There a Reproducibility Crisis?” The author, Monya Baker, highlights a contradiction within the survey (Baker 2016): although researchers expected high confidence in their own fields, they admitted an inability to replicate results a majority of the time.
On a related note we may ask: what place have fashion and popularity in science? Fashion does have a notable impact on availability of funding. We defer to Panel 2.2, “Selected Invitations to Experimental Research, Insights from Theoreticians,” because funding encompasses all research, not just experiments.
2.5 Uncertainty
Physical measurements are subject to errors from many sources. Try as we may, we can never entirely correct for all the possible error sources. Sometimes we may simply decide it is too expensive to correct for a small possible error. Sometimes, we don’t know an error is present. In the best of situations, we must admit that even our corrected value is probably not exact. To deal with this situation we need a quantitative way to estimate the possible residual error and to describe it to potential users of the data. This problem was recognized by Airy (1879), who used the term “uncertainty,” which he defined as “The possible value that the residual error may have.”
Most engineering experiments involve several measurements, and the result of the experiment is derived from the measured values using a set of equations called the “data interpretation program.” The challenge is to estimate the uncertainty in the derived result as a consequence of the recognized uncertainties in the measurements.
2.6 Uncertainty Analysis
Low uncertainty, by itself, is no assurance of accuracy in our results. The mathematical process by which we estimate the uncertainty in the result is referred to as “uncertainty analysis.”
Uncertainty analysis is a powerful tool that, properly used, can help an experimenter develop a credible experiment. For example, we can use uncertainty analysis during the planning phase in order to select the approach offering the least uncertainty. We can use it to choose the most appropriate instruments. In the “shakedown and debugging phase” of an experiment, it helps to attribute which residual errors cause differences between our result and the expected result.
Uncertainty analysis was introduced to the American technical literature in a paper by Kline and McClintock (1953). Uncertainty analysis focuses on estimating how much uncertainty there is in the derived result as a consequence of the acknowledged