Grim's Hall: Sniff tests

This is a happy development: it looks as though people really are trying to do something about the replicability crisis in psychological studies. One tool that's proving useful is a "trading market," in which a lot of independent analysts make online bets about which studies will replicate, using subjective tools they might have hesitated to use in public or formal settings.

What clues were the traders looking for? Some said that they considered a study’s sample size: Small studies will more likely produce false positives than bigger ones. Some looked at a common statistical metric called the P value. If a result has a P value that’s less than 0.05, it’s said to be statistically significant, or positive. And if a study contains lots of P values that just skate under this threshold, it’s a possible sign that the authors committed “p-hacking”—that is, they futzed with their experiment or their data until they got “positive” but potentially misleading results. Signs like this can be ambiguous, and “scientists are usually reluctant to lob around claims of p-hacking when they see them,” says Sanjay Srivastava from the University of Oregon. “But if you are just quietly placing bets, those are things you’d look at.

Beyond statistical issues, it strikes me that several of the studies that didn’t replicate have another quality in common: newsworthiness. They reported cute, attention-grabbing, whoa-if-true results that conform to the biases of at least some parts of society.

Sniff tests

1 comment: