Here D represents disease and S represents sign/symptom.
As previously explained in several posts by now, the probability of S is conditional to the occurrence of it’s causal factor D. We denote this conditional probability as: P(S
D) which means the probability of S given D. To facilitate the connection to our established concepts of sensitivity, specificity and likelihood ratios we will consider D and S as binary variables (yes / no).
The first thing to point out is that the notation P(S
D) generally stands for 2^n probabilities with n being the number of variables (instantiations of the state of S and D in this case). For example:
D) includes: P(+S
The first connection to make is that P(+S
+D) is the sensitivity - the proportion of True Positives
The second connection is that P(-S
-D) is specificity - the proportion of True Negatives
+D) is : 1 - Sensitivity which is the proportion of False Negatives; and P(+S
-D) is: 1 - Specificity which is the proportion of False Positives (I can provide a worksheet with all the algebra if anyone is interested ;)
You may recall that the equations for likelihood ratios are:
Which is just saying that the +LR is the ratio of True Positives to False Positives; and the -LR is the ratio of False Negatives to True Negatives; and explains why the greater the +LR the better the Sign/Symptom for ruling in a Disease (or condition) and the lower (closer to 0) the -LR the better the Sign/Symptom is for ruling out a Disease (or condition).
If you recall Bayes equation, then you notice that Sensitivity is the conditional probability necessary to calculate P(+D
+S), the probability of Disease conditional on the presence of a Sign/Symptom - the equation is reviewed in this post. And you notice that Sensitivity is the conditional probability necessary to calculate P(-D
-S), the probability of not Disease conditional on the absence of a Sign/Symptom.
Using Bayes and spelling it all out:
Probability of the Disease given the presence of the Sign/Symptom is equal to:
Sensitivity * Probability of Disease / Probability of Sign/Symptom
Notation is much easier once you can follow it:
+S) = P(+S
+D) * P(D) / P(S)
There is a difference between Bayes and the metrics of diagnostic accuracy that must be kept in mind and is what keeps us from using the diagnostic metrics from making claims such as: “Because the sensitive is .9, the probability of having the Disease when having the Sign/Symptom is 90%.” (not true!) It is that Sensitivity in it’s raw form: P(+S
+D) is not giving is the probability of Disease (cause) with the observation of the Sign/Symptom (effect). It is giving us the Probability of the Sign/Symptom (effect) when we KNOW someone has the Disease (cause). To convert from a Cause to Effect to an inverse probability of observe Effect and infer Cause (abduction) we must multiply by the probability of the disease, and divide by the probability of the symptom (based on our prior knowledge). The Likelihood ratios are better for sure, but they still do not tell you the probability of the disease given the symptom.
U1 is another possible cause of S, which would increase P(+S
-D); U2 is another possible effect of D, which could increase P(-S
+D); U3 and U4 factors could confound (exacerbate, eliminate) the relationship between D and S in a variety of ways.
What we learn from the graphical model, that is still an abstraction of reality but perhaps a more realistic model than the original, is that the true: P(S
D) is really: P(S
D, U1, U3, U4), and that U1 is independent of U3 and U4; but that U3 and U4 are not independent of one another based on this graphic as U3 can cause D, which can cause U4. In other words, the graphic reveals that the situation is MUCH MORE COMPLEX than the original assumptions of the traditional diagnostic metric approach (using sensitivity, specificity and likelihood ratios) just do not capture. Not only is the complexity not captured, but I believe that is it not hard to imagine how a Bayesian Network (DAG, graphical model) approach is much more helpful to the overall goal of differential diagnosis. Don’t worry - we can still calculate probability distributions and have a metric for our diagnostic accuracy - it would just be based on a more accurate model of reality - that is it’s causal structure. As I quoted in the post on Models: “All models are wrong, some models are useful” George Box