The skeptics need to chill out and remember what science is for.

Ever since Daryl Bem’s paper was accepted for publication in the Journal of Personality and Social Psychology, there has been this idea going around among skeptics, to the effect that Bem’s study can safely be ignored because it doesn’t measure up methodologically.  This idea seems to come mainly from a paper submitted to the JPSP by four University of Amsterdam researchers, who concluded that “Bem’s p-values do not indicate evidence in favor of precognition; instead, they indicate that experimental psychologists need to change the way they conduct their experiments and analyze their data.”

A bold claim!  And I think we can take it to mean that Bem’s paper shouldn’t have been accepted for publication in a high impact journal, and indeed, shouldn’t have been taken seriously at all.  In other words:  move along; nothing to see here.

The paper by the U-Amsterdam researchers is long, and the one by Bem is even longer, so I think that a lot of people won’t even have bothered to check either one.  But having now managed to wade through them both, I have a few thoughts.

What follows, by the way, is not about the issue of psi’s reality/non-reality.  It’s about scientific methods and “attitude.”

The Dutchmen’s Straw Man

The Dutchmen’s paper was clearly intended to be viewed as a “takedown” of Bem.  But in my view, the only thing they took down was a straw man claim that they had set up themselves, in the segment I have italicized below.

Bem takes these findings to support the hypothesis that people “use psi information implicitly and nonconsciously to enhance their performance in a wide variety of everyday tasks”. In further support of psi, Utts (1991, p. 363) concluded in a Statistical Science review article that “(…) the overall evidence indicates that there is an anomalous effect in need of an explanation” (but see Diaconis, 1978; Hyman, 2007). Do these results mean that psi can now be considered real, replicable, and reliable?  We think that the answer to this question is negative…

But who, really, would think that the answer is positive?  Of course Bem’s results as presented are “in support of” the psi hypothesis, but the notion that “these results mean that psi can now be considered real, replicable, and reliable” is ridiculous.  Even if the object of investigation here were only a mundane phenomenon (rather than a “paranormal” phenomenon with all the opposition that label evokes), no one would regard a single paper from a single laboratory as being so conclusive, on its own or against a background of prior controversial research.  Science is considered a communal endeavor in large part because individual experiments, and the scientists who conduct them, are presumed to be fallible.

For that reason, the Dutchmen’s entire attack on Bem and his methods is just not as interesting as they seem to think.  But even on their own terms, their arguments are dubious.

Did Bem “cherry pick” his results?

This term “cherry pick” came to me from someone who sent in a comment* on a previous post, and cited as evidence the following passage from the Dutchmen’s paper:

The Bem experiments were at least partly exploratory. For instance, Bem’s Experiment 1 tested not just erotic pictures, but also neutral pictures, negative pictures, positive pictures, and pictures that were romantic but non-erotic. Only the erotic pictures showed any evidence for precognition. But now suppose that the data would have turned out differently and instead of the erotic pictures, the positive pictures would have been the only ones to result in performance higher than chance. Or suppose the negative pictures would have resulted in performance lower than chance. It is possible that a new and different story would then have been constructed around these other results (Bem, 2003; Kerr, 1998). This means that Bem’s Experiment 1 was to some extent a fishing expedition, an expedition that should have been explicitly reported and should have resulted in a correction of the reported p-value.

I had to review this paragraph a couple of times because at first I couldn’t believe what I was reading.  The portion I have italicized above, in this context, is roughly equivalent to stating: It is possible that Daryl Bem is a cheater. (The Bem 2003 and Kerr 1998 references in parentheses are merely to earlier writings in which Bem and co-authors had encouraged exploratory research.)

What the Dutchmen suggest is that Bem generated a bunch of data, applied various hypotheses to it retrospectively, selected those hypotheses that fit the data, and then – improperly – constructed an account of the experiments as if these successful hypotheses had been set up prospectively.

Incidentally, this sort of thing (an exploratory analysis disguised as a prospective/confirmatory analysis) has been identified as a problem in other areas of science, and has been referred to variously as cherry picking; post-hoc analysis; publication bias; and the “file drawer effect.”  It also has been compared to “firing an arrow at a blank wall and painting a bullseye around the spot where the arrow lands.”

So did Bem really do such a thing?  In his paper I could find no clear indication that he had.  He did mention that he had done some initial, unpublished pilot studies, and that these had guided the designs of the published set of experiments — which is normal.  In the case of the experiment involving the erotic images mixed with other images, he also cited earlier research by other investigators which had suggested that erotic visual stimuli might get into undergrads’ brains more easily (go figure) than other emotional stimuli.  In general, he portrayed his erotic-image hypothesis as a prospective one:

the main psi hypothesis was that participants would be able to identify the position of the hidden erotic picture significantly more often than chance (50%).  The hit rate on erotic trials can also be compared with the hit rates on the nonerotic trials to test whether there is something unique about erotic content in addition to its positive valence and high arousal value. For this purpose, 40 of the sessions comprised 12 trials using erotic pictures, 12 trials using negative pictures, and 12 trials using neutral pictures.

Moreover, in a recent rebuttal to the Dutchmen, Bem (writing with two statistician colleagues) addressed this issue again:

The important point here is that the central psi hypothesis about erotic images was unambiguous, directional, based on previous research, not conditional on any findings about trials with nonerotic images, and was not formulated from a post hoc exploration of the data.  In fact, there was no data exploration that required adjustment for multiple analyses in this or any other experiment.

So if we are to take Bem at his word, then the Dutchmen’s assertion is simply wrong.  I also note that in their own “fishing expedition” against him, they seem (to me anyway) to have gone past the bounds of propriety in scientific discourse.  I suppose they would say that Bem’s heretical findings made him fair game for such unusually adversarial treatment.

Bayes and Bias

In the last part of the Dutchmen’s paper they propose that the standard and relatively simple statistical techniques that Bem used should not be used in such studies, but should be replaced with more exclusionary techniques that filter out all but the strongest evidence.

In order to overcome our skeptical prior opinion, the evidence needs to be much stronger… in order to convince scientific critics of an extravagant or controversial claim, one is required to pull out all the stops.

They adopt something called the Bayesian t-Test, then make certain assumptions about key variables, apply the “test” to Bem’s results, and conclude from this that Bem’s results are too ambiguous to take seriously (“worth no more than a bare mention”).

What’s wrong with this picture?  Well, first, they engage in the same legerdemain of which they accuse Bem, namely a post hoc, goalpost-shifting analysis which – surprise! – supports their view.  And although they suggest that their proposed new method should be applied generally in psychology experiments, they appear to have been motivated to write their paper by only one set of experiments, namely Bem’s.  In their t-Test discussion they also seem to treat Bem’s paper (“In order to overcome our skeptical prior opinion…” – my italics) as if it were potentially conclusive, which (again) is ludicrous — but may explain why they over-exert themselves in an effort to knock it down.

(Bem et al in their rebuttal have a more technically detailed critique of the Dutchmen’s re-analysis and its assumptions.)

To people who study psi, or to sociologists who study scientists, this is probably an all-too-familiar situation: in which deep, tribal, burn-them-at-the-stake bias strives mightily to rationalize itself as a “scientific” response.  But the worst part of it is that the Dutchmen effectively discourage other scientists from taking psi seriously, even as an object of experimentation.  Along with everything else in their attack, they spend more than  a page casually and scornfully dismissing the likelihood of psi, e.g., “there is no real-life evidence that people can feel the future” — and they conclude that a plausible pre-experiment probability that precognition is real is only one in 100 quintillion.

Let’s face it:  Psi these days, even if it is real, is quite subtle and fluky.  But that’s not uncommonly the case for phenomena in the discovery phase before experimental paradigms are optimized.  The answer is not to scorn the research and snuff it out, but to encourage more research — at a certain level of prominence and in a sustained way — so that there may emerge (a) a robust paradigm by which the phenomenon can be demonstrated, or (b) a robust explanation for why the “phenomenon” is only illusory.  In the case of psi, such closure is never going to come from arguments over a single paper!

I can hear the skeptics now, complaining that more psi research would divert researchers from more socially pressing experimentation.  Please.  Psi’s potential significance is – from a truly scientific perspective – enormous.  Psi experiments are also among the cheapest to run in all of science.

Speaking of which:  Bem’s paper was published with Bem listed as the sole author.  I am not that familiar with the practices in the psych world, but I know that in the biological sciences, sole authorship is rare for a presentation of original research.  Why weren’t Bem’s grad students listed as co-authors?  Was it merely because they feared that this could hurt their future careers?  If that’s the case – and we should be allowed to know – then once again, shame on the skeptics for putting such fear into experimenters.


