Tag: falsifiability

Scientific methodology (German edition)

3. Die deduktive Überprüfung der Theorien. Die Methode der kritischen Nachprüfung, der Auslese der Theorien, ist nach unserer Auffassung immer die folgende: Aus der vorläufig unbegründeten Antizipation, dem Einfall, der Hypothese, dem theoretischen System, werden auf logisch-deduktivem Weg Folgerungen abgeleitet; diese werden untereinander und mit anderen Sätzen verglichen, indem man feststellt, welche logischen Beziehungen (z. B. Äquivalenz, Ableitbarkeit, Vereinbarkeit, Widerspruch) zwischen ihnen bestehen.

Dabei lassen sich insbesondere vier Richtungen unterscheiden, nach denen die Prüfung durchgeführt wird: der logische Vergleich der Folgerungen untereinander, durch den das System auf seine innere Widerspruchslosigkeit hin zu unter­suchen ist; eine Untersuchung der logischen Form der Theorie mit dem Ziel, festzustellen, ob es den Charakter einer empirisch-wissenschaftlichen Theorie hat, also z. B. nicht tautologisch ist; der Vergleich mit anderen Theorien, um unter anderem festzustellen, ob die zu prüfende Theorie, falls sie sich in den verschiedenen Prüfungen bewähren sollte, als wissenschaftlicher Fortschritt zu bewerten wäre; schließlich die Prüfung durch „empirische Anwendung“ der abgeleiteten Folgerungen.

Diese letzte Prüfung soll feststellen, ob sich das Neue, das die Theorie behauptet, auch praktisch bewährt, etwa in wis­senschaftlichen Experimenten oder in der technisch-praktischen Anwendung. Auch hier ist das Prüfungsverfahren ein deduktives: Aus dem System werden (unter Verwendung bereits anerkannter Sätze) empirisch moglichst leicht nach­prüf­bare bzw. anwendbare singuläre Folgerungen („Prognosen“) deduziert und aus diesen insbesondere jene ausgewählt, die aus bekannten Systemen nicht ableitbar sind, bzw. mit ihnen in Widerspruch stehen. Über diese – und andere – Folgerungen wird nun im Zusammenhang mit der praktischen Anwendung, den Experimenten usw. entschieden. Fällt die Entscheidung positiv aus, werden die singulären Folgerungen anerkannt, verifiziert, so hat das System die Prüfung vorläufig bestanden; wir haben keinen Anlaß, es zu verwerfen. Fällt eine Entscheidung negativ aus, werden Folgerungen falsifiziert, so trifft ihre Falsifikation auch das System, aus dem sie deduziert wurden.

Die positive Entscheidung kann das System immer nur vorläufig stützen; es kann durch spätere negative Entscheidungen immer wieder umgestoßen werden. Solang ein System eingehenden und strengen deduktiven Nachprüfungen standhält und durch die fortschreitende Entwicklung der Wissenschaft nicht überholt wird, sagen wir, daß es sich bewährt.

Induktionslogische Elemente treten in dem hier skizzierten Verfahren nicht auf; niemals schließen wir von der Geltung der singulären Satze auf die der Theorien. Auch durch ihre verifizierten Folgerungen können Theorien niemals als „wahr“ oder auch nur als „wahrscheinlich“ erwiesen werden.

Inducing absolute truth

Universal generalisations are certainly testable, even if not provable, in the sense that it is always possible that the experiments we perform or the observations we make should turn out to falsify them. So the substitution of testability for provability allows universal generalisations to be included in science all right. Indeed Karl Popper has built a whole philosophy of science on the principle that what distinguishes science from non-science is its ‘falsifiability’.

This weakening of the empiricist requirements on science does not really solve the problem of induction. Even if the requirement of testability succeeds in picking out what people standardly and intuitively count as proper science, it leaves us with a problem of explaining why such proper science is a good thing. We have still been given no account of why success in past tests should be a good basis for accepting generalisations which predict the future. [21]

The real Popper

It is worth noting that even in Lakatos’s own “methodology of scientific research programmes” (“MSRP”)—a type of sophisticated methodological falsificationism that Lakatos presents as the crowning synthesis of the “thesis” dogmatic falsificationism and the “antithesis” naive methodological falsificationism—the test statements and interpretative theo­ries still are accepted on the basis of a research program. So Lakatos gives a conventionalist solution to the problem of how basic statements are selected, in his interpretation of Popper’s methodology and in his own methodology as well.

This interpretation of Popper is not correct, and the suggested conventionalist solution to the problem of how test state­ments are accepted is not satisfying. Popper’s criticist solution, which Lakatos has not correctly understood, is much better and is also a solution that allows us to understand the history of science better than Lakatos’s oversophisticated combination of conventionalism and falsificationism. Lakatos maintains that sophisticated methodological falsifica­tionism combines the best elements of voluntarism, pragmatism, and the realist theories of empirical growth. Critical falsificationism is better still, among other reasons because it avoids that kind of eclecticism. And for those interested in the history of ideas, it might be worthwhile to know that the real Popper is neither a dogmatic falsificationist nor a naive or sophisticated methodological falsificationist. Not only Popper0 but also Popper1 and Popper2 are myths created by a misunderstanding of Popper’s critical falsificationism.[53]

Falsification as conditional disproof

Kuhn asked what falsification is, if not conclusive disproof. The answer is that falsification is a conditional disproof, conditional on the truth of the used test statements (and in some cases also on the truth of some used auxiliary hypotheses). Feyerabend’s example of the alleged falsification of the Copernican system with naked-eye observations shows this conditional character of falsifications quite well.

Does this cause any logical or methodological problems? The logical situation is quite clear and unproblematic. The methodological situation is only problematic for those who assume that there are infallible test statements. But as Kuhn said, Popper stresses that test statements are fallible. [56]

The myth of naive falsificationism

Naive falsificationism is a myth created by positivist and conventionalist misunderstandings of Popper’s methodology. In the contemporary methodological discussion it is time to end the discussion of the straw man of naive falsificationism in its different positivist and conventionalist variants. It is time to come back to reality and to begin a discussion of real and critical falsificationism. [62]

Fisher’s severe tests

In choosing the grounds upon which a general hypothesis should be rejected, the exprimenter will rightly consider all points on which, in the light of current knowledge, the hypothesis may be imperfectly accurate, and will select tests, so far as possible, sensitive to these possible faults, rather than to others. [47]

Fisher on the logic of null hypotheses

In relation to any experiment we may speak of this hypothesis as the “null hypothesis,” and it should be noted that the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every ex­periment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.

It might be argued that if an experiment can disprove the hypothesis that the subject possesses no sensory discrimi­nation between two different sorts of object, it must therefore be able to prove the opposite hypothesis, that she can make some such discrimination. But this last hypothesis, however reasonable or true it may be, is ineligible as a null hypothesis to be tested by experiment, because it is inexact. If it were asserted that the subject would never be wrong in her judgements we should again have an exact hypothesis, and it is easy to see that this hypothesis could be dis­proved by a single failure, but could never be proved by any finite amount of experimentation. [16]

Fisher on significance tests

In considering the appropriateness of any proposed experimental design, it is always needful to forecast all possible results of the experiment, and to have decided without ambiguity what interpretation shall be placed upon each one of them. Further, we must know by what argument this interpretation is to be sustained. …

It is open to the experimenter to be more or less exacting in respect of the smallness of the probability he would require before he would be willing to admit that his observations have demonstrated a positive result. It is obvious that an experiment would be useless of which no possible result would satisfy him. Thus, if he wishes to ignore results having probabilities as high as 1 in 20—the probabilities being of course reckoned from the hypothesis that the phenomenon to be demonstrated is in fact absent … . It is usual and convenient for the experimenters to take 5 per cent. as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard, and, by this means to eliminate from further discussion the greater part of the fluctuations which chance causes have intro­duced into their experimental results. No such selection can eliminate the whole of the possible effects of chance co­incidence, and if we accept this convenient convention, and agree that an event which would occur by chance only once in 70 trials is decidedly “significant”, in the statistical sense, we thereby admit that no isolated experiment, how­ever significant in itself, can suffice for the experimental demonstration of any natural phenomenon; for the “one chance in a million” will undoubtedly occur, with no less and no more than its appropriate frequency, however surprised we may be that it should occur to us. In order to assert that a natural phenomenon is experimentally demonstrable we need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a pheno­menon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result. [12-4]

So you did one study? Do some more.

If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent. point), or one in a hundred (the 1 per cent. point). Personally, the writer prefers to set a low standard of significance at the 5 per cent. point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance. The very high odds sometimes claimed for experimental results should usually be discounted, for inaccurate methods of esti­mating error have far more influence than has the particular standard of significance chosen. [504-5]

Weak statistical tests

The distinction between the strong and the weak use of significance tests is logical or epistemological; it is not a statistical issue. The weak use of significance tests asks merely whether the observations are attributable to “chance” (i.e., no relation exists) when a weak theory can only predict some sort of relation, but not what or how much. The strong use of significance tests asks whether observations differ significantly from the numerical values that a strong theory predicts, and it leads to the fourth figure of the syllogism—p ⊃ q, ~q , infer ~p—which is formally valid, the logician’s modus tollens (“destroying mode”). Psychologists should work hard to formulate theories that, even if somewhat weak, permit derivation of numerical point values or narrow ranges, yielding the possibility of modus tollens refutations. [422]