Artículos
SEQUENTIAL BAYESIAN TESTS AND THE INDEPENDENCE HYPOTHESIS OR NAIVE BAYES
PRUEBAS BAYESIANAS SECUENCIALES Y LA HIPÓTESIS DE INDEPENDENCIA O NAIVE BAYES
SEQUENTIAL BAYESIAN TESTS AND THE INDEPENDENCE HYPOTHESIS OR NAIVE BAYES
Revista de la Facultad de Ciencias, vol. 7, no. 1, 2018
Universidad Nacional de Colombia
Received: 17 August 2017
Accepted: 17 October 2017
Resumen: Con bastante frecuencia el diagnóstico no es definitivo con una prueba médica, pero solo después que se aplique una secuencia de pruebas. ¿Cómo se combinará la información proporcionada por una prueba con la información transmitida por una segunda prueba? ¿Podemos “agregar” la información de las pruebas clínicas suponiendo independencia condicional conocido como “naive” o “independiente” Bayes? En este artículo desarrollamos un simple y básico exacto Factor de Bayes para verificar el Modelo de Bayes independiente vs el Modelo completo de Bayes, sin el supuesto de independencia condicional. Asumiendo “independiente” Bayes cuando de hecho no lo es, exagera la acumulación de dos positivos a favor de la enfermedad, y dos negativos en contra. Aquí también ilustramos, que incluso en situaciones de evidencia leve contra el modelo de independencia, la diferencia entre los dos modelos puede ser notablemente diferente en presencia de evidencia conflictiva entre las pruebas médicas. Como consejo práctico, cuando se aplica una secuencia de pruebas en combinación de forma rutinaria, se debe realizar un estudio para el cual los resultados de un grupo de pacientes se mantengan y estudien con y sin el supuesto de independencia, y los factores de Bayes deben ser calculados. Este trabajo amplía y generaliza el trabajo de Pereira & Pericchi (1990) y Mossman & Berger (2001).
Palabras clave: Factor de Bayes, evidencia conflictiva, naive Bayes, pruebas clínicas múltiples.
Abstract: Quite frequently diagnosis is not final with one medical test but only after a sequence of tests are applied. How the information given by one test is going to be combined with the information conveyed by a second test? Can we “add” up the information of the medical tests assuming conditional independence which is called “naive” or “independent” Bayes? In this article we develop a very simple and basic exact Bayes Factor to check the independent Bayes Model VS the full Bayes Model, without the assumption of conditional independence. Assuming independence Bayes when in fact is not, overstate the accumulation of two positives in favor of the disease and two negatives against. Here we also illustrate, that even in situations of mild evidence against the independence model the difference between the two models may be strikingly different in the presence of conflicting evidence between the medical tests. As a practical advice, when a sequence of tests are applied in combination routinely, a study should be conducted for which the joint results of a set of patients is kept and studied with and without the assumption of independence, and Bayes Factors should be calculated. This work extends and generalizes the work of Pereira & Pericchi (1990) and Mossman & Berger (2001).
Keywords: Bayes factor, conflicting evidence, independence Bayes, sequence of clinical tests.
1. INTRODUCTION
Are you sick if your result was positive in a test? If you are not convinced, does a second test will determine it? Clinical diagnostic tests are the most important source to help determine whether a patient has a disease or not. Although sometimes we are not willing to accept the test result; we consider a second test, the same or different from the first, to reaffirm or invalidate the result of the first test. In addition to obtain test results, the most important is how we use the results to make the final diagnosis.
We must know whether there is evidence from the data in favor of the premise of conditional independence or not. This is an important consideration in determining the probability of having the disease. That is, evidence that is under the assumption of conditional independence will give us more information about the patient than not considering it. Therefore, assuming conditional independence would be underestimating or overestimating the result, which would affect the final decision of the doctor and the patient. We will analyze the data from the paper of Pereira & Pericchi (1990) in the Hospital das Clínicas, São Paulo. We will use the likelihood function that doesn’t consider conditional independence which is presented in that paper, and build a likelihood function that consider the conditional independence assumption, and we calculate a Bayes Factor to determine which of the two models is best.
On the other hand, Mossman & Berger (2001) present five methods to calculate credible intervals of the probability of having the disease given a test result. Among those considered here, Objective Bayesian method proved to be the best of the methods presented. This will enable us to observe differences between assuming conditionally independent evidence or not. The authors implemented an algorithm for a single test and in this work we adjust to include a second test. Hyndman (1996) implemented a code to calculate Highest Posterior Density intervals that will be used to compare the results obtained from the Objective Bayesian method. Bayesian methods are different from the frequentist method, or “Plug-In”, on which parameters are fixed on its estimated values, and assumed (wrongly) to be known. The correct method is the Bayesian one, that takes into account the variability of the parameters and it is shown here that the difference can be important.
2. DATA
The data were obtained from the paper of Pereira & Pericchi (1990). The data set consists of observations in 100 children with biliary obstruction, classified as; intrahepatic (D) and extrahepatic (Dc). Two clinical trials, each with positive or negative results, to differentiate between the two states were made. Our study uses conditional probabilities, for which we need patient data including the results of both tests. For this reason, we can not consider the seven children (out of 100) who did not participate in the second test. In Table 1 there are the results of the remaining 93 children.
E++, E+−, E−+ and E−− is the total number of people as superscript, responded positively or negatively to each test.
3. NOTATION
The probability of having the disease D, known as the prevalence P(D), is denoted by d. It is assumed that diagnostic tests have only two possible outcomes: positive and negative. The probability of obtaining a positive result in the first test given the patient has the disease D, known as sensitivity, P(T1+|D), is denoted by p. And the probability of obtaining a negative result in the first test given the patient does not have the disease D, which is known as the specificity, P(T1+|Dc) , and is represented by q1.
The probability of obtaining a positive result in the second test given that the patient has the disease D and the result of the first test is positive, P(T2+|D,T1+) , is represented by p+. The probability of obtaining a positive result in the second test given that the patient has the disease D and the result of the first test is negative, P(T2+|D,T1-) , and represented by p−. The probability of obtaining a negative result in the second test given that the patient has the disease D and the result of the first test is positive, P(T2-|Dc,T1+) it is represented by q+. The probability of obtaining a negative result in the second test given that the patient has the disease D and the result of the first test is negative, P(T2-|Dc,T1-) it is represented by q−.
We know that two events A and B are conditionally independent given C if and only if P(B|C,A) = P(B|C). Therefore, assuming conditional independence between two tests given D, we define p2 as the probability of a positive result in the second test given that the patient has the disease P(T2+|D) . And we define q2 as the probability of a negative result in the second test given that the patient does not have the disease, P(T2-|Dc) .
We define φ+j to j = 1,2, as the probability of having the disease D given that the positive outcome of the first or second test P(D|Tj+). And φ−j as the probability of having the disease D given that the negative result of the first or second test P(D|Tj−). And finally, φ++=P(D|T1+,T2+), as the probability of having the disease D given that the results of both tests are positive. Similarly, we define for φ+−, φ−+ and φ−−. We will use i and ni to identify when tests are conditionally independent or not, respectively. The following Table 2 summarizes the notation.
Bayes’ theorem is used to obtain the following equations:
Under the assumption that the tests are conditionally independent:
Without the assumption that the tests are conditionally independent:
4. LIKELIHOOD FUNCTION
Pereira & Pericchi (1990) present the likelihood function for the data:
where m is the total number of patients with D who responded to the first test and x is the number of patients with D that the result was positive in the first test. And n is the total number of patients with Dc who responded to the first test and y is the number of patients with Dc that the result was negative in the first test. When the two tests are conditionally independent given D then p+ = P(T2+ |D,T1+ ) = P(T2+ |D) = p2 and p− = P(T2+ |D,T1- ) = P(T2+ |D) = p2. And q+ = P(T2- |Dc ,T 1+ ) = P(T2- |Dc ) = q2 and q− = P(T2- |Dc ,T1- ) = P(T2- |Dc ) = q2. The likelihood function for this case is:
This constant is from the binomial distributions.
5. BAYES FACTOR
Bayes Factor is used to compare models, deciding which model best predict the data. We are going to consider our first model, M1, where both tests are conditionally independent as the null hypothesis; while the second model, M2, will not consider the conditional independence assumption, later called a General Bayesian method, corresponds to the alternative hypothesis.
Pereira & Pericchi (1990) suggest using an objective prior for p1, q1, p2 and q2 a prior distribution Beta(1,1) and p+, p−, q+ and q− a beta distribution with parameters and . A very simple Bayes Factor arises following their suggestion. Using equations 14 and 13 as the likelihood function for M1 and M2, respectively, we get the Bayes Factor:
where B is the beta function
Applying this factor to the data set under study, we obtained a score of 0.6224, implying that M2 fits better to the data than M1. According Jeffreys scale (Jeffreys, 1961) this is Grade 1, or mild evidence against the null hypothesis.
We proceed to prove some important results that justify the metods put forward here.
Theorem 1. The order of the tests will not affect the outcome of the Bayes factor.
Proof. We use the following notation
in the equation 15:
Using equation 16,
By reorganizing the numerators and denominators we obtain:
By Regrouping and using equation 16 we obtain:
We showed that equations 15 and 17 coincide, therefore, the order of tests does not affect the Bayes Factor result.
Theorem 2. If conditional independence is not assumed then the joint full tables are necessary to fit the full Bayes model. But independent Bayes only requires the marginal information.
Proof. Let T1 and T2 conditionally independent tests, then
P(T2|D,T1) = P(T2|D). P(T2|D) does not depend on T1, therefore, we do not need joint data sets of T1 and T2
6. CALCULATION AND COMPARISON OF OBJECTIVE BAYESIAN AND PLUG-IN METHODS
The Bayesian approach assigns a prior distribution to each parameter. In our case the parameters are: d, p1, q1, p+, q+, p−, q−, p2 y q2. From these priors we can determine the posterior distribution and credible interval for φ, φi and φni. The selection of these prior distributions depends on the knowledge we have of the overall situation and the origin of the data, this analysis is called subjective Bayesian analysis. However, the intervals for φ, φi and φni are derived using data only, in which case the focus will be Objective Bayesian analysis. This latter chooses non-informative a priori distributions leading to an objective analysis of the parameters that depend only on the data and model rather than subjective beliefs.
The intervals for φ, φi and φni that will result from the Objective Bayesian method are generally called çredible intervals". These intervals are analogous to frequentist confidence intervals to be considered for their properties of frequentist coverage. This means that for a significance level α, the confidence intervals, under replication of the experiment, contain φ, φi and φni (1−α) of the time.
We use the prior distributions suggested by Pereira & Pericchi (1990). This selection of priors result in posterior distribution Beta(xi +1,ni −xi +1) to d, p1, q1, p2 and q2 and Beta(xi + 1/2 ,ni −xi + 1/2) to p+, q+, p− and q−.
The following process, simple Monte Carlo, will produce a bilateral credible interval with a significance level α for φ+ y φ++. Using R:
1. Simulate values for each of the parameters with their respective posterior distributions.
2. Use the equations 1 - 4 to obtain the value of φ.
3. Repeats this process for a large number N of times to generate values of φ.
4. Use the equations 5 - 12 to find φi and φni.
5. Sort the values and find the nearest integer to Nα/2 and N(1−α/2) of φ, φi y φni, these values are the lower and upper limit, respectively, of the intervals.
The order on which we decide to choose the evidence will not affect the outcome, in other words P(D|T1+ ,T2+ ) =P(D|T2+ ,T1+ ). Furthermore, Bayes Theorem is sequential, meaning that the posterior probability after observing a result can be used as the prior probability to calculate the future posterior probability, regardless of the conditional independence assumption (DeGroot & Schervish, 2002). Therefore,
From the definition of conditional probability P(B)P(A|B) = P(A∩B) we know that
Similar for
then
The Highest Posterior Density (HPD) intervals are obtained from a region with a confidence level so that the probability of each point of the region is at least equal to the probability of each point outside the region. Therefore, this method contains the values that are more likely. Hyndman (1996) implemented a code for these credible intervals.
We will consider the Plugin method using each of the above parameters as fixed values. For example, to get the value of d, the probability of having the disease, divide n by n+m . The value n represents the number of children with the condition D and n+m represents the total number of children. Similarly proceed to p1, q1, p+, q+, p−, q−, p2 and q2. Applying these fixed values to equations 1 - 12 we obtain the values of φ, φni y φi. This does not take into account the variability of estimates can have these probabilities as mentioned in Mossman & Berger (2001).
Table 3 shows the odds of having the disease if the first or second test is positive or negative and that both tests are positive, negative or combined, when considering conditional independence between tests or not. These were calculated with the “Plug-In” method (PIM), Objective Bayesian method(OBM) and Highest Posterior Density (HPD). Furthermore, we calculated the mean and standard deviation (SD) of the intervals.
We can see that the PIM is contained in the OBM credible interval. In addition, the PIM φi++ is greater than φni++ except of φ−−. This implies that assuming conditionally independent tests overestimates the probability that the patient has the disease. For φ−−, to not assume conditional independence is about twice as likely to assume conditional independence. The only case that showed a slight difference is φ+−. The one who produces greater conflict it is the φ−+. This shows a PI of 0.5110 would say that the patient is sick against a 0.3333 would say the opposite. Therefore, the general Bayesian method reflects better the conflict information. Furthermore, we note that the mean is approximately equal to the PI value. This tells us that it is good estimator for PIM values.
When comparing the PIM with the OBM we see the wide gap that exists in the posterior probabilities of both. For example, for φ1+ we obtained PIM of 0.7843 compared OBM ranging between 0.6540 and 0.8742. Furthermore, the observed difference in the PIM also is presented in the OBM. The lower limits of the credible interval of the tests that do not assume conditionally independence tend to be lower than assuming conditional independent assumption except for φ−−. In this case, the upper limit is greater without assuming conditional independence. Again, in the case of φ+− there is a noticeable difference, but for φ−+ there is a large discrepancy in the intervals. This can be seen in the standard deviation for φ−+. Without assuming conditional independence we obtained 0.2162 but under the assumption of conditional independence it is 0.1470. This is consistent with what Pereira & Pericchi (1990) found on the difference in assuming conditional independence or not in this case. In Fig. 1 the posterior distributions of credible intervals were observed. These clearly show that there is a wide difference when considering the assumption of independence or not, in which case we would be underestimating or overestimating the chances of having the disease D when both tests are positive, negative or combined.
Credibility intervals for HPD are approximately equal to OBM intervals. So, compared to the PI method we will reach the same conclusions. Now if we look at Table 4 the length of the credible intervals for HPD method are shorter than OBM, as it can be shown mathematically in wide generality.
7. CONCLUSION
The Plug-In values are found to be well inside the intervals and similar to expectations. But the Plug-In method does not give information about variability.
Unless there is strong evidence given by the developed Bayes Factor, that the model that assumes conditional independence is the appropriate model, one must analyze tests with the general Bayesian method, which does not assume conditional independence. Assuming conditional independence by mistake overestimates the evidence when you have two positives or two negatives. But perhaps the most interesting result is when the tests give conflicting outcomes, i.e. a positive and a negative, the full model and the independence model may differ by even a larger margin.
References
De Braganca Pereira, C. A. & Pericchi, L. R. (1990). Analysis of diagnosability. Applied statistics, 189-204.
DeGroot, M. H. & Schervish, M. J. (2002). Probability and statistics. 72-73.
Hyndman, R. J. (1996). Computing and graphing highest density regions. The American Statistician, 50(2), 120-126.
Jeffreys, H. (1961). The theory of probability. Oxford University Press.
Mossman, D. & Berger, J. O. (2001). Intervals for posttest probabilities: a comparison of 5 methods. Medical Decision Making, 21(6), 498-507.
R Development Core Team. (2011). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, Internet: http://www.R-project.org/.