The process of differential diagnosis is essentially the process of identifying an adjustment set and testing (in some way) the members of that set.

Think of an adjustment set as the set of information you need to know to make a valid inference.

The formal definition of an adjustment set is a: “set of covariates such that adjustment, stratification, or selection (e.g. by restriction of matching) will minimize bias when estimating the causal effect of the exposure on the outcome (assuming that the causal assumptions encoded in the diagram hold).” (Drawing and Analyzing Causal DAGs with DAGitty, User Manual for Version 2.2, Johannes Textor)

This definition is specific to inductive inferences. Inductive inference (induction) has been previously discussed  - but in summary they are the generalizations made after a sequence of particular observations, and we hope to be able to accept these generalizations as universally true. Recall from your research methods (or evidence based practice) courses the concept of “external validity.” A study is externally valid when it’s conclusions are generalizable to some population (that means, the conclusion will hold true to a group beyond the study sample).  The study sample is a essentially a sequence of particular observations. We develop causal models (DAGs) based on the results of inductive inferences. We are doing this either implicitly or explicitly. My posts on KBP are a call to be more explicit in our use of causal models as the sum of our research activities, since the graphical form is logically structured and can identify flaws in reasoning and inferential traps (such as the bi-directional relationship between napping and fibromyalgia symptoms).

For differential diagnosis we use DAGs (causal models), first by selecting the appropriate DAG for the given situation, then by reasoning through it (we do this either implicitly or explicitly). The best way to reason though it is explicitly so that we reduce the risk of flawed reasoning and traps.

The abduction adjustment set must include all of the variables in the inductive adjustment set for all possible causes of the effects under consideration (that is all of the conditionals: If cause, then effect) and also any variable that can directly cause the effect.  It is important to point out that even with an adjustment set for abduction and with complete adjustment on the variables of that set, abduction still cannot be justified (valid and sound). Meaning, with abduction, we cannot say that if the premises are true the conclusion necessarily follows (is logically entailed) even when the adjustment set is fully adjusted, unless the causal association is of the form “if and only if” (iff) as explained in the previous post. We will address this more once we cover Bayes theorem and tests of diagnostic accuracy.

Let’s start with a graph from a prior post on differential diagnosis:


With this DAG we created what might be considered the first step and level of inquiry - the first set of possible causes of thoracic pain. The adjustment set for “Thoracic Pain” based on this DAG is: {Muscle, Rib, Cardiac, Gall Bladder, Liver}

The next step and deeper level of inquiry is to consider the DAGs for each of the causes in the above DAG. Each one can be expanded - the above is a first step, first level representation. The reasoning at this point - abduction (also can be referred to as inference to the best explanation) - helps us figure out the possible causes based on the causal structure (which can be represented as a DAG). In other words, the only way we an successfully reason with abduction and inference to the best explanation in the differential diagnostic process is with knowledge of the underlying causal structure.

There are two approaches in a clinical situation to get from the DAG above to an understanding of the cause of thoracic pain and they both include knowing something deeper about each possible cause. It is important to know that these approaches are not mutually exclusive - both are used interchangeably. One is to rule out possible causes, the other is to rule in possible causes. We can rule out one pathway by identifying something on the pathway from that cause to thoracic pain that we can objectively determine does not exist.

For example - let’s agree that the cardiac route to thoracic pain is through ischemia, heart failure, valve dysfunction and/or pericarditis. The following graphic depicts the second step, second level specific to either ruling in or ruling out the cardiac cause of thoracic pain:


The adjustment set for the cardiac cause is now: {Ischemia, Pericarditis, Valve Dysfunction, Heart Failure}. If we accept that {Heart Failure, Valve Dysfunction and Pericarditis} all cause an abnormal heart sound, then we may be comfortable that not having an abnormal heart sound rules out this set of possible causes. If we accept that {Ischemia} causes ST segment changes on the ECG, then not having ST segment changes on the ECG can rule out cardiac cause. With such a case we feel justified in ruling out cardiac cause overall as we believe we have observed information about all the variables in the adjustment set between cardiac cause and thoracic pain.

You might ask - why not “rule in” first? Why is “ruling out” the first step I took here and not “ruling in” as part of the process. One thing underlying the original causal model that we did not discuss explicitly yet is that in the implication: {Muscle, Rib, Cardiac, Gall Bladder, Liver} -> Thoracic Pain

The set: {Muscle, Rib, Cardiac, Gall Bladder, Liver} elements are actually joined by the “or” operator.

And:  {Muscle or Rib or Cardiac or Gall Bladder or Liver} -> Thoracic Pain

Is logically equivalent to:

(Muscle -> Thoracic Pain) and (Rib -> Thoracic Pain) and (Cardiac -> Thoracic Pain) and (Gall Bladder -> Thoracic Pain) and (Liver -> Thoracic Pain)

See page 58, Theorem 3.78 in Gries and Schneider

I did not have to put this explicitly earlier because the DAG, as presented, provides that logically representation in its form - but if you are new to learning DAGs you may not be able to make that connection right away. This is a long explanation of the fact that these causes are NOT MUTUALLY EXCLUSIVE. Therefore, ruling one in, does not rule the others out. In other words, someone could have more than one cause of their thoracic pain. If you jump to rule in a rib problem, even if ruled in, they still may have a muscle, gall bladder, cardiac or liver cause as well…..

As you may have already thought - the terms ruling in and ruling out sound like the terminology we we use when learning about sensitivity, specificity and likelihood ratios. There is a great reason for that and we will connect back to this once solidifying the use of DAGs and adjustment sets, and Bayes theorem in the process of getting from observations to inferences. These readily discussed and clinically useful concepts (sensitivity, specificity and likelihood ratios) are all based on probability, and while worthwhile they are only as useful as our understanding on causal structure from which the probability distribution arises.