Question

Critically assess the observation that ‘Traditional use of significance testing is an inherently ...

Critically assess the observation that ‘Traditional use of significance testing is an inherently misleading process that should be abandoned in favour of other approaches’ (Cohen, 1994)

0 0
Add a comment Improve this question Transcribed image text
Answer #1

"After four decades of severe criticism, the ritual of null hypothesis significance testing---mechanical dichotomous decisions around a sacred .05 criterion---still persist. This article reviews the problems with this practice..." ... "What's wrong with [null hypothesis significance testing]? Well, among many other things, it does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!" (Cohen 1994)

WHY ARE HYPOTHESIS TESTS USED?

With all the deficiencies of statistical hypoth- esis tests, it is reasonable to wonder why they remain so widely used. Nester (1996) suggested several reasons: (1) they appear to be objective and exact; (2) they are readily available and eas- ily invoked in many commercial statistics pack- ages; (3) everyone else seems to use them; (4) students, statisticians, and scientists are taught to use them; and (5) some journal editors and thesis supervisors demand them. Carver (1978) recognized that statistical significance is gener- ally interpreted as having some relation to rep- lication, which is the cornerstone of science. More cynically, Carver (1978) suggested that complicated mathematical procedures lend an air of scientific objectivity to conclusions. Shav- er (1993) noted that social scientists equate be- ing quantitative with being scientific. D. V. Lindley (quoted in Matthews 1997) observed that "People like conventional hypothesis tests because it's so easy to get significant results from them."

I attribute the heavy use of statistical hypoth- esis testing, not just in the wildlife field but in other "soft" sciences such as psychology, soci- ology, and education, to "physics envy." Physi- cists and other researchers in the "hard" sci- ences are widely respected for their ability to learn things about the real world (and universe) that are solid and incontrovertible, and also yield results that translate into products that we see daily. Psychologists, for 1 group, have diffi- culty developing tests that are able to distin- guish 2 competing theories.

In the hard sciences, hypotheses are tested; that process is an integral component of the hy- pothetico-deductive scientific method. Under that method, a theory is postulated, which gen- erates several predictions. These predictions are treated as scientific hypotheses, and an experi- ment is conducted to try to falsify each hypoth- esis. If the results of the experiment refute the hypothesis, that outcome implies that the theory is incorrect and should be modified or scrapped. If the results do not refute the hypothesis, the theory stands and may gain support, depending on how critical the experiment was.

In contrast, the hypotheses usually tested by wildlife ecologists do not devolve from general theories about how the real world operates. More typically they are statistical hypotheses (i.e., statements about properties of popula- tions; Simberloff 1990). Unlike scientific hy- potheses, the truth of which is truly in question, most statistical hypotheses are known a priori to be false. The confusion of the 2 types of hy- potheses has been attributed to the pervasive influence of R. A. Fisher, who did not distin- guish them (Schmidt and Hunter 1997).

Scientific hypothesis testing dates back at least to the 17th century: in 1620, Francis Ba- con discussed the role of proposing alternative explanations and conducting explicit tests to dis- tinguish between them as the most direct routeto scientific understanding (Quinn and Dunham 1983). This concept is related to Popperian in- ference, which seeks to develop and test hy- potheses that can clearly be falsified (Popper 1959), because a falsified hypothesis provides greater advance in understanding than does a hypothesis that is supported. Also similar is Platt's (1964) notion of strong inference, which emphasizes developing alternative hypotheses that lead to different predictions. In such a case, results inconsistent with predictions from a hy- pothesis cast doubt of its validity.

Examples of scientific hypotheses, which were considered credible, include Copernicus' notion HA: the Earth revolves around the sun, versus the conventional wisdom of the time, Ho: the sun revolves around the Earth. Another ex- ample is Fermat's last theorem, which states that for integers n, X, Y, and Z, Xn + yn = Zn implies n - 2. Alternatively, a physicist may make specific predictions about a parameter based on a theory, and the theory is provision- ally accepted only if the outcomes are within measurement error of the predicted value, and no other theories make predictions that also fall within that range (Mulaik et al. 1997). Contrast these hypotheses, which involve phenomena in nature, with the statistical hypotheses presented in The Journal of Wildlife Management, which were mentioned above, and which involve prop- erties of populations.

WHAT ARE THE ALTERNATIVES? What should we do instead of testing hypoth- eses? As Quinn and Dunham (1983) pointed out, it is more fruitful to determine the relative importance to the contributions of, and inter- actions between, a number of processes. For this purpose, estimation is far more appropriate than hypothesis testing (Campbell 1992). For certain other situations, decision theory is an appropriate tool. For either of these applica- tions, as well as for hypothesis testing itself, the Bayesian approach offers some distinct advan- tages over the traditional methods. These alter- natives are briefly outlined below. Although the alternatives will not meet all potential needs, they do offer attractive choices in many fre- quently encountered situations.

Estimates and Confidence

Intervals Four decades ago, Anscombe (1956) ob- served that statistical hypothesis tests were to- tally irrelevant, and that what was needed were estimates of magnitudes of effects, with stan- dard errors. Yates (1964) indicated that "The most commonly occurring weakness in the ap- plication of Fisherian methods is undue em- phasis on tests of significance, and failure to recognize that in many types of experimental work estimates of the treatment effects, togeth- er with estimates of the errors to which they are subject, are the quantities of primary interest." Further, because wildlife ecologists want to in- fluence management practices, Johnson (1995) noted that, "If ecologists are to be taken seri- ously by decision makers, they must provide in- formation useful for deciding on a course of ac- tion, as opposed to addressing purely academic questions." To enforce that point, several edu- cation and psychological journals have adopted editorial policies requiring that parameter esti- mates accompany any P-values be presented (McLean and Ernest 1998).

Decision Theory

Often experiments or surveys are conducted to help make some decision, such as what limits to set on hunting seasons, if a forest stand should be logged, or if a pesticide should be approved. In those cases, hypothesis testing is inadequate, for it does not take into consideration the costs of alternative actions. Here a useful tool is statistical decision theory: the theory of acting rationally with respect to anticipated gains and losses, in the face of uncertainty. Hypothesis testing generally limits the probability of a Type I error (rejecting a true null hypothesis), often arbitrarily set at a = 0.05, while letting the probability of a Type II error (accepting a false null hypothesis) fall where it may. In ecological situations, however, a Type II error may be far more costly than a Type I error (Toft and Shea 1983). As an example, ap- proving a pesticide that reduces the survival rate of an endangered species by 5% may be disas- trous to that species, even if that change is not statistically detectable. As another, continued overharvest in marine fisheries may result in the collapse of the ecosystem even while statistical tests are unable to reject the null hypothesis that fishing has no effect (Dayton 1998). Details on decision theory can be found in DeGroot (1970), Berger (1985), and Pratt et al. (1995).

Model Selection

Statistical tests can play a useful role in di- agnostic checks and evaluations of tentative sta- tistical models (Box 1980). But even for this ap- plication, competing tools are superior. Infor- mation criteria, such as Akaike's, provide objec- tive measures for selecting among different models fitted to a dataset. Burnham and An- derson (1998) provided a detailed overview of model selection procedures based on informa- tion criteria. In addition, for many applications it is not advisable to select a "best" model and then proceed as if that model was correct. There may be a group of models entertained, and the data will provide different strength of evidence for each model. Rather than basing decisions or conclusions on the single model most strongly supported by the data, one should acknowledge the uncertainty about the model by considering the entire set of models, each perhaps weighted by its own strength of evi- dence (Buckland et al. 1997).

Bayesian Approaches

Bayesian approaches offer some alternatives preferable to the ordinary (often called fre- quentist, because they invoke the idea of the long-term frequency of outcomes in imagined repeats of experiments or samples) methods for hypothesis testing, as well as for estimation and decision-making. Space limitations preclude a detailed review of the approach here; see Box and Tiao (1973), Berger (1985), and Carlin and Louis (1996) for longer expositions, and Schmitt (1969) for an elementary introduction.

CONCLUSIONS

Editors of scientific journals, along with the referees they rely on, are really the arbiters of scientific practice. They need to understand how statistical methods can be used to reach sound conclusions from data that have been gathered. It is not sufficient to insist that au- thors use statistical methods-the methods must be appropriate to the application. The most common and flagrant misuse of statistics, in my view, is the testing of hypotheses, espe- cially the vast majority of them known before- hand to be false. With the hundreds of articles already pub- lished that decry the use of statistical hypothesis testing, I was somewhat hesitant about writing another. It contains nothing new. But still, read- ing The Journal of Wildlife Management makes me realize that the message has not really reached the audience of wildlife biologists. Our work is important, so we should use the best tools we have available. Rarely, however, is that tool statistical hypothesis testing.

Add a comment
Know the answer?
Add Answer to:
Critically assess the observation that ‘Traditional use of significance testing is an inherently ...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
  • Use the traditional method of hypothesis testing to test the given claim about the means of...

    Use the traditional method of hypothesis testing to test the given claim about the means of two populations. Assume that two dependent samples have been randomly selected from normally distributed populations. 8) The table below shows the weights of seven subjects before and after following a particular diet for two months Subject | A B C D EFG 190-TS-TRYー161 154 153 167 After 183 144 181 166 140 155 155 Using a 0.01 level of significance, test the claim that...

  • Use either the P-value method or the traditional method of testing hypotheses. Company A uses a...

    Use either the P-value method or the traditional method of testing hypotheses. Company A uses a new production method to manufacture aircraft altimeters. A simple random sample of new altimeters resulted in errors listed below. Use a 0.05 level of significance to test the claim that the new production method has errors with a standard deviation greater than 32.2 ft which was the standard deviation for the old production method. If it appears that the standard deviation is greater does...

  • Show all steps for full credit. When testing the hypothesis please use a significance level of...

    Show all steps for full credit. When testing the hypothesis please use a significance level of .05. Forman wanted to know if a new drug increased the ability of rats to learn a difficult maze. One group of rats was given a drug 2 hours before the testing and the other group was given a placebo. The response measure is the number of trials required to learn the maze. Did the rats given the drug make fewer mistakes than those...

  • Use the following description to answer the next three (3) questions. Engineers are interested in testing...

    Use the following description to answer the next three (3) questions. Engineers are interested in testing the claim that the resistance of electric wire can be reduced by alloying. A random sample of 32 standard wires yielded , 0.136 ohm and s, -0.004 ohm. Another random sample of 32 alloyed wires yielded x2- 0.083 ohm and s2- 0.005 ohm. At the o.05 significance, do the data support the claim? 21. State the appropriate hypotheses. A. B. C. Ho: μι-Hz versus...

  • Strategic Marketing- W3-A2 ( USE WALMART AS THE ORGANIZATION) Presume that the organization you selected Walmart....

    Strategic Marketing- W3-A2 ( USE WALMART AS THE ORGANIZATION) Presume that the organization you selected Walmart. (USE Walmart as the Organization). 1. Summarize the key product/service offered by your organization, including the role of the customer service department or direct service required to make a sale to the customer. 2. Create and defend a diagram or flowchart detailing the service delivery process. The chart should clearly identify each contact point (e.g. frontline employee, supervisor, delivery, etc.) before, during, and after...

  • How can we assess whether a project is a success or a failure? This case presents...

    How can we assess whether a project is a success or a failure? This case presents two phases of a large business transformation project involving the implementation of an ERP system with the aim of creating an integrated company. The case illustrates some of the challenges associated with integration. It also presents the obstacles facing companies that undertake projects involving large information technology projects. Bombardier and Its Environment Joseph-Armand Bombardier was 15 years old when he built his first snowmobile...

  • Use the this tool to critique the article Screening Questions Can't tell No 1. Was there...

    Use the this tool to critique the article Screening Questions Can't tell No 1. Was there a clear statement of the aims of the research? Whet was "he geal of the 'esearch? Why it was thought inportant? hs relevanoe · 。YesOcan't tell。No 2. Is a qualitative methodology appropriate? HINT: Consider If the research seeks to interpret or iluminate the actions and/or subjective expariences of neseanch Is qualitative research the right methodolagy for addressing the research ? Is it worth continuing?...

  • 1.what is the fundamental of knowledge management cycle or process? 2. what is knowledge cycle? 3....

    1.what is the fundamental of knowledge management cycle or process? 2. what is knowledge cycle? 3. what is intellectual capital and three dimension of intellectual capital? 4. what is human capital? discusion with example 5. what is knowledge sharing and organization learning? 6. what is organization culture? 7. cultural impact of knowledge sharing? 8. what is data mining? 9. what is knowledge discover? UNIVERSAL Chapter 1 Knowledge Management Overview UBSS SCHOOL SYDNEY Introduction to Knowledge Management (KM) In a knowledge...

  • please help with no 3,4,5 and 6 Thanks 1. What is standard error? Measure of statistical...

    please help with no 3,4,5 and 6 Thanks 1. What is standard error? Measure of statistical accuracy of an estimate, equal to the standard deviation of the theorental distribution of a large populanon of such estimates 2. What calculation is used to describe the variation in measurements. 3. In experiment 3.2. You will be measuring out 40ml of volume. How many times will you measure 40 ml? 4. What is the difference for each time you measure out 40ml? In...

  • Name: Section Number To be graded assignments must be completed and submitted on the original book...

    Name: Section Number To be graded assignments must be completed and submitted on the original book page Hypothesis Testing -As a Diagnostic Test ? Answer the following questions over the content material you just read or watched. 1. What is a false positive rate in the context of hypothesis testing? 2. What is the goal of hypothesis testing? 3. What is a Type I error, and how is it related to an "alpha level?" 4. What does it mean to...

ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT