Uncertainty & Risk: A Statistical Perspective

Uncertainty & Risk: A Statistical Perspective

Digital Dialog
Speakers: Srinivas Atreya, Chief Data Scientist, Cigniti

  • Here is the Transcript

Srini: Hello friends, this is Srini Atreya here again.

As promised last time., this time around we’re going to talk about uncertainty from a statistical viewpoint. This may be a bit longish, but please bear with me on this.

The example that I’m going to talk about today has been taken from the Understanding uncertainty website.

In March 2011, the highly respected journal of Personality and Social Psychology published a paper by the distinguished psychologist Daryl Ben of Cornell University. The paper reports a series of experiments which Ben claims provide evidence for some type of extrasensory perception. These can occur only the generally accepted laws of physics are not all true. That’s a pretty challenging claim. And the claim is based largely on the results of a very common, unfortunately, very commonly misunderstood too statistical procedure called significance testing. Ben’s article reports the results of nine different experiments. But for our purpose, it’s sufficient to only look at experiment number two.

This is based on the well-established psychological knowledge about perception. Images that are flashed up on a screen for an extremely short time, so short that the conscious mind does not register that they have been seen at all, can still affect how an experimental subject behaves. Such images are said to be presented subliminally or to be subliminal images. For instance, people can be trained to choose one item rather than another by presenting them with a pleasant or at any rate not unpleasant subliminal image after they have made the correct choice, and a very unpleasant one after they have made the wrong choice. Ben, however, did something rather different in experiment two.

As in a standard experiment of this sort, his participants had to choose between two closely matched pictures projected clearly on a screen side by side. Then they were presented a neutral subliminal image if they had made the correct choice, and an unpleasant subliminal image if they had made the wrong choice. The process was then repeated with a different pair of pictures to choose between. Each participant made their choice 36 times and there were 150 participants in all. But the new feature of Ben’s experiment was that when the participants made their choice between the two pictures in each pair, nobody, not the participants, not the experimenters could know which was the correct choice. The correct choice was determined by a random mechanism after a picture had been chosen by the respondent. If the experiment was working as designed, and if the laws of physics relating to causes and effects are as we understand them, then the subliminal images could have no effect at all on the participant’s choices of pictures. This is because at the time they made their choice, there were no correct image. To choose which image was correct was determined afterwards. Therefore, given the way the experiment was designed, one would expect each participant to be correct in their choice half the time. On average, because of random variability,.- some would get more than 50%,- right, some would get less, but on average people would make the right choice 50% of the time. What then found was that the average percentage of correct choices across 150 participants was not 50%, it was slightly higher 51.7%. Now, there are several possible explanations for this finding, including the following.

The rate was higher than 50% just because there is random variability both in the way people respond and in the way the correct image was selected. That is, nothing very interesting happened. The rate was higher than 50% because the laws of cause and effect are not, as we understand them conventionally and somehow the participants could know something about which picture was correct before the random system had decided which was correct. The rate was higher than 50% because there was something wrong with the experimental setup and the participants could get an idea about which picture was correct when they made their choice without the laws of cause and effect being broken. I won’t consider all of these in detail. Instead, I will concentrate on how and why Ben decided that point one was not a likely explanation.

Ben carried out a significant test. Actually, he made several different significance tests making slightly different assumptions in each case, but they all led to more or less the same conclusion. So, I will discuss only the simplest.

The resulting P value was 0.009. Because this value is small, he concluded that explanation one was not appropriate and that the result was statistically significant. This is a standard statistical procedure which is very commonly used. But what does it actually mean and what is the speed value? All significant steps involve a null hypothesis which typically is a statement that nothing very interesting has happened. In a test comparing the effects of two drugs, the usual null hypothesis would be that on average the drugs do not differ on their effects. In Ben’s experiment number two, the null hypothesis is that explanation one is true. The true average proportion of correct answers is 50% and any difference from 50% that is observed is simply due to random variability. The P-value for the test is found as follows one assumes that the null hypothesis is really true. One then calculates the probability of observing the data that was actually observed or something more extreme under this assumption. That probability is the P-value. So in this case, Ben used standard methods to calculate the probability of getting an average proportion correct of 51.7% or greater on the assumption that all that was going on was chance variability, he found this probability to be 0.009 or 0.009 is quite a small probability.

So we have two possibilities here either the null hypothesis really is true, but nevertheless, an unlikely event has occurred or the null hypothesis just isn’t true. Since unlikely events do not occur often, we should at least entertain the possibility that the null hypothesis isn’t true. Other things being equal, the smaller the P value, the more the doubt it casts on the null hypothesis. How small the P-value needs to be in order for us to conclude that there’s something really dubious about the null hypothesis. Depends on the circumstances. Sometimes the values of 0.05 or 0.01 are used as boundaries, and a P-value less than that would be considered a significant result. This line of reasoning, the standard and statistic, is not at all easy to get one’s head around. In an experiment like this, one often sees the P-value interpreted as follows. A one interpretation the P value is 0.009. Given that we have got these results, the probability the chance alone is operating is 0.009. Unfortunately, this interpretation is wrong. The correct way of putting it is the p-value is 0.009. Given that chance alone is operating, the probability of getting results like this is 0.009. So the difference between A and B is that the given part and the part that has the probability of 0.009 are swapped around. It may well not be obvious why that matters. The point is that the answers to the two questions given A, what is the probability of B? And given B, what is the probability of A? Might be quite different.

An example is to imagine that you’re picking a random person off the street in London. Given that the person is a member of the parliament, what is the probability that they are a British citizen? Well, that probability would be high. What about the other way around? Given that this random person is a British citizen, what is the probability that they are an MP? I hope it’s clear to you that the probability would be very very low. The great majority of the British citizens in London are not MPs, so it’s fairly obvious. I hope that swapping the given part and the probability part changes things quite dramatically. Nevertheless, Ben is interpreting his significance test in the commonly used way when he deduces from a P-value of 0.009 that the result is significant and that there may well be more going on than simply the effects of chance. But just because this is common, that does not mean it is always correct.

Despite the very widespread use of statistical significance testing, particularly in psychology, this method has been heavily criticized by psychologists themselves as well as by some statisticians and other scientists. One criticism that is relevant here concerns alternatives to the null hypothesis. Remember the conclusion that was reached when the P value in Ben’s experiment was 0.009. Either the null hypothesis really is true, but nevertheless, an unlikely event has occurred or the null hypothesis just isn’t true. This says nothing about how likely the data were, if the null hypothesis isn’t true, maybe they’re still unlikely. Even if the null hypothesis is false, surely we can’t just throw out the null hypothesis without further investigation of what the probabilities are if the null hypothesis is false, this is an issue that must be dealt with. If one is trying to use the test to decide whether the null hypothesis is true or false.

It should be said that the great statistician and geneticist R.A. Fischer who invented the notion of significance testing would simply not have used the result of a significant test on its own to decide whether a null hypothesis is true or false. He would have taken other relevant circumstances into account but unfortunately not every user of significant steps follows Fischer’s approach the usual way to deal with the situation where the null hypothesis might be false is to define a so called alternative hypothesis in the case of the Ben experiment. This would be the hypothesis that the average rate of correct answers is greater than 50% you might think we could just calculate the probability of getting the Ben’s data on the assumption that the alternate hypothesis is true but there’s a snack the alternative hypothesis simply says that the average rate is more than 50% it doesn’t say how much more than 50% if the real average rate were let’s say 99%. Then getting an observed rate of 51.7% isn’t very likely but if the real average rates were 51.5% then getting an observed rate of 51.7% is quite likely but real averages of 99% and of 59.5% are both covered by the alternative hypothesis so this isn’t going to get us off the hook. One possibility is to meet the issue of misinterpreting the P-value head on. I said that many people think that the P-value 0.009 actually means that the probability that the null hypothesis is true is 0.009 given that the data that was observed.

Well, I explained why that’s not correct but why do people interpret it that way? In my view it’s because what people actually want to know is how likely it is that the null hypothesis is true given the data that were observed. The P-value does not tell them this. So they just act as if it did the P-value does not tell people what they want to know because in order to find the probability that the null hypothesis is true given the data.

One needs to take a Bayesian approach to statistics there is more than one way to do that. But the one I will describe it is as follows it uses the odds form of Bayes theorem the theorem behind Bayesian approach to statistics we will get into the guts of Bayesian theory in another time as part of another podcast but for now let’s understand just two quantities.

The prior odds are a ratio of probabilities of hypothesis before the data have been taken into account that is. It is supposed to reflect the beliefs of the person making the calculation before they saw any of the data and people’s beliefs differ in a subjective way one person may simply not believe it possible at all that ESP exists in that case they would say before any data was collected the probability of the alternative hypothesis is zero and the probability of the null hypothesis is one. This means that for this person, the prior odds for the alternative hypothesis is zero divided by one, which is just zero. It follows that the posterior odds must also be zero whatever the value of the base factor. Hence, for this person, the probability of the alternate hypothesis, given whatever data, is always going to be zero. This person started believing that ESP could not exist and his or her mind cannot be changed by the data at all. Another person might think it’s very unlikely that ESP exists but might not want to rule it out as being completely impossible. This may lead them to set the prior odds for the alternative hypothesis, not a zero, but as some very small number. Let’s say one by 10,000. If the base factor turned out to be big enough, the posterior odds for the alternative hypothesis might nevertheless be a reasonably sized number. So that Bayes theorem is telling this person that after the experiment they should consider the alternate hypothesis to be reasonably likely. Thus, different people can look at the same data and come to different conclusions about how likely it is that the null hypothesis or the alternate hypothesis is true. Also, the probability that the null hypothesis is true might or might not be similar to the P-value. It all depends on the prior odds as well as on the base factor.

You might think that the issue of people having different prior odds could be avoided by concentrating on the base factor. If I could tell you the base factor for one of Ben’s experiments, you could decide what your prior odds were and multiply them by the base factor to give your own posterior odds. In fact, one of Ben’s critics, Eric Jan Wegen makers, takes exactly that line concentrating on the base factors for Ben’s experiment two, for instance, Wegen makers and colleagues calculate the base factor as about 1.05. Therefore, the posterior odds for the alternative hypothesis are not much larger than the prior odds. Or, putting it in another way, they would say that the data provides very little information to change one’s prior views. They therefore conclude that this experiment provides rather little evidence that ESP exists and certainly not enough to overturn the established laws of physics. They come to similar conclusions about several more of Ben’s experiments, and for others they calculate the base factor as being less than one. In these cases, the posterior odds for the alternative hypothesis will be smaller than the prior odds. That is, one should believe less in ESP after seeing the data than one did beforehand.

Well, that’s an end to it, isn’t it? Despite all Ben’s significance tests, the evidence provided by the experiments for the existence of ESP is either weak or nonexistent. But no, it’s not quite as simple as that. The trouble is, there’s more than one way of calculating a base factor. It is reasonably straightforward to calculate the probability of data given the null hypothesis, but the probability of data given the alternative hypothesis is harder. The alternative hypothesis includes a range of values of the quantity of interest. In Ben’s experiment two it includes the possibility that the average percentage correct is 50.001% or that it is 100%, or anything in between. Different individuals will have different views on which values in this range are most likely. Putting another way, the base factor also depends on the subjective prior opinions. Avoiding the issue of prior odds by concentrating on the base factor has not made the subjectivity quite go away.

Wegen Maker and Ben have gone back and forth on their calculation of the base factor, and there’s an ongoing debate on whose approach is right. Well, in my view, that’s the wrong question. It would be very nice if experiments like these could clearly and objectively establish one way or the other whether ESP can exist. But they hope that the data can speak for themselves in an unambiguous way is in vain. If there is really some kind of ESP effect, it is not large, otherwise it would have been discovered years ago. So any such effect must be relatively small. And that’s not straightforward to observe. Again, it’s the inevitable variability between individual experimental participants. Since the data are therefore not going to provide really overwhelming evidence one way or the other. It seems to me that people will inevitably end up believing different things in the light of the evidence, depending on what else they know and to some extent on their views of the world. Just looking at the numbers from Ben’s experiment is not going to make this subjective element disappear. And herein lies the secret of uncertainty. Uncertainty will always be there. It is just not possible to eliminate it completely. We can only hope to understand its nature through controlled experiments.

Hope this has helped you in your journey towards understanding and benefiting from uncertainty. Once again, thanks for listening and please feel free to leave any comments that will help us improve this podcast.

We’d love to host you on our show!

We invite thought leaders from the Quality Engineering & Quality Assurance space to be guests on our show –
Cigniti QATalks – Podcasts that Inform, Educate & Empower.

If you or someone you know, would like to share their QA Transformation experience & other trending topics in QA/QE domain, do let us know. We would love to feature them on our show!