CAUSAL AND OTHER CRITERIA FOR COEFFICIENTS OF ASSOCIATION
ABSTRACT
There are a variety of criteria for measures of association. This work reviews several criteria, introduces a new criterion - the "causal criterion" - and applies it to measures of association for nominal, ordinal, and interval-level variables, establishing that these criteria are satisfied by Goodman & Kruskal's tau (Goodman & Kruskal, 1954), Goodman & Kruskal's lambda modified for tied values (Goodman & Kruskal, 1954), and Pearson's r2, respectively.
In the case of interval-level variables, the causal criterion enables us to perform principal components analysis through use of the equation rxy.f=0, where f is the first principal component. This equation is extended to nominal and ordinal variables, resulting in an interpretation of [log-linear analysis?] as a principal components analysis.
CAUSAL AND OTHER CRITERIA FOR COEFFICIENTS OF ASSOCIATION
I. CRITERIA (AND NONCRITERIA) FOR MEASURES OF ASSOCIATION
Analysts have invented many ingenious methods of measuring the degree of association among variables. The problem is to get criteria that help us pick a "useful" one, or else to get criteria that tell one what to look for in creating one. When I was in graduate school, I fell in love with information-theoretical measures, as set forth by Attneave (1959). Since then, I have become aware of a criterion that seems contradictory to the information-theoretical measures, so I want to back up now and look again at how we choose measures of association.
First, let me say that I am concerned with measures that are useful for causal analysis. I'm interested in prediction, in one variable causing another. I don't think that deliberately symmetrized measures are particularly useful; our task would seem to be the construction of causal theories, which means that our associations are directional.
Second, I want to look at measures of association between all different types of variables, i.e., where X and Y are of different levels of measurement. I will restrict myself here to nominal, ordinal, and interval measures, but ultimately I hope my musings will be able to extend to ratio-level variables as well as variables of more exotic types (e.g., partial orderings). Anyway, we are looking to select measures of association for nine different pairs of variables: associations where both X and Y are of the same level of measurement (three cases: N1-->N2, O1-->O2, and I1-->I2, where N, O and I indicate nominal, ordinal, and interval variables, respectively), plus associations where X and Y differ in their level of measurement (six cases: N-->O, N-->I, O-->N, O-->I, I-->N, and I-->O). Note that we do not assume that the measures of association need be symmetric. First, it is not clear that this should be a criterion; such a demand has to be separately justified, not assumed without thought. Second, we will ultimately turn our attention to causal analysis of relationships, and since causality is unidirectional, there seems to be good reason not to have symmetry as a criterion. Finally, symmetric measures don't seem very natural for relationships between variables of different levels of measurement; our current preoccupation with measures that are symmetric between variables that do have the same level of measurement(1) may have led us to give symmetry too high a value.
So what are the criteria that spring to mind?
[The "Range" Criterion] The Measure Varies Between 0.0 and 1.0
xx
Note that for our purposes we don't require that a measure take negative values; we're merely looking for a measure of the degree of association.
[The "Statistical Independence" Criterion] The Measure Takes the Value 0.0 in the Case of Statistical Independence Between the Variables
We demand that if X and Y are statistically independent, then the measure of association should take the value 0.
[The "Meaning of 0" Criterion] The Measure Takes the Value 0.0 Only When the Variables Are Statistically Independent
Note that we have not yet defined "statistical independence". This concept is clear for N-->N, but it appears more complex for other relationships, both same-level (e.g., O-->O) and other-level relationships (e.g., I-->O). It may be that we need to conceptualize statistical independence at the same time as we invent our measures of association. However, the concept of statistical independence seems conceptually prior to (or at least no later than) the measurement of degrees of association, so we will deal with it first. (Actually, now that I think of it, my sense is that the two have to be defined at the same time and on similar grounds. It seems plausible to me to say, "Two variables are associated if and only if they aren't statistically independent," and so we get this criterion.
Note that this criterion excludes such measures as lambda.
[The "Perfect Association" Criterion] The Measure Takes the Value 1.0 in the Case of Perfect Association Between the Variables
We demand that if X and Y are perfectly associated, then the measure of association should take the value 1.
[The "Meaning of 1" Criterion] The Measure Takes the Value 1.0 Only When the Variables Are Perfectly Associated
Note that we have not yet defined "perfect association". As with "statistical independence," we may have to conceptualize it at the same time as we invent our measures of association. However, the concept of a perfect association seems conceptually prior to (or at least no later than) the measurement of degrees of association, so we will deal with it before dealing with measuring association.
There is a dispute over whether "perfect association" should include cases where X is a necessary but not sufficient cause of Y (or a sufficient but not necessary cause). In other words, should we count either (1) or (2) below as cases of perfect association?
Relationship (1)
| 40 | 40 | 40 |
| 10 | 0 | 0 |
Relationship (2)
| 0 | 40 |
| 0 | 40 |
| 10 | 40 |
[The "Interpretability" Criterion] The Measure Is Interpretable[?]
Prediction? Causation? PRE-type meaning? Perhaps I should just outright state that since my interest is in causation, I'm only interested in PRE-type measures.
[The "Computability" Criterion] The Measure Must Be Capable of Computation, But the Necessary Computations Need Not Be Easy
[Obvious. Probably even unnecessary to state this as a criterion. My main purpose is the second part: that we should not be afraid of messy computations.]
II. THE "CAUSAL" CRITERION AND OTHER CRITERIA
The above criteria are accepted generally (albeit not universally). A new criterion I call the causal criterion: that if the variables X and Z are related only through the intermediate variable Y, then M(x-->z) = M(x-->y)*M(y-->z).(2) (Note that X, Y and Z need not all be the same level of measurement.) In other words, if X-->Y and Y-->Z and the relationship X-->Z is the "compounding" of the two separate relationships X-->Y and Y-->Z, then the measure of the overall relationship is the product of the measure of its two links.
A. Why Accept the Causal Criterion?
Why should we accept the causal criterion? First, the multiplicative mathematics seems to make sense. If two variables are associated only through some intermediate variable, then it stands to reason that the cumulative association should be weaker than the two links individually. This is seen most directly at the extremes. If either of the two links has a zero measure (that is, the two variables are statistically independent), then the overall relationship should also be one of statistical independence and thus should have a zero measure. And if either of the two links represents a perfect association between the two variables, then the overall relationship between the extremes should be just that of the remaining link. Second, it allows us to do a principal components analysis.
B. How Do We Define a "Compound Relationship"?
What does it mean to say that we "compound" two relationships? If X-->Y and Y-->Z, what is the "compounded" relationship X-->Z? How is it defined? How is it calculated?
It is clear how we can calculate it for nominal variables: straight matrix multiplication. It is not so clear for ordinal variables.
C. The "Independent Causation" Criterion
The "independent causation criterion": Assume X and Y are statistically independent. The measure of association satisfies the independent causation criterion if and only if the multiple association between Z and X,Y jointly is the sum of the individual measures of association of XZ and YZ.
III. INTERVAL VARIABLES (I-->I)
A. Pearson's R²
Pearson's R2Satisfies all the criteria:
However, I'm not sure about the definition of statistical independence. If it is just that the covariance of X and Y is zero, then the criterion that M(independence)=0.0 is tautologous, since the definition of independence is built into the definition of r2 itself. Is this a problem?
B. Other I-->I Measures?
Do other measures satisfy the criteria? What about R² with a nonlinear (say, quadratic) term? Does that satisfy the causal criterion? I don't think so, because if Y is perfectly quadratically related to X, and Z is perfectly quadratically related to Y, then Z is perfectly related to the fourth power of X, and the nonlinear R² wouldn't be able to pick that up (not with the value 1.0, anyway).
IV. NOMINAL VARIABLES (N-->N)
A. Goodman & Kruskal's Tau-B[?]?
xx
Note it's not symmetric.
B. Goodman & Kruskal's (Symmetric) Tau-C
Symmetric, but doesn't satisfy the causal criterion or (when the matrix is not square) the perfect association criterion. It does satisfy the perfect association criterion for square matrices. Basically, I think that symmetry is far less important than causality.
C. Lambda [Gamma?]?
Doesn't satisfy the criterion that 0 means statistical independence. Also doesn't satisfy the causal criterion.
D. Attneave's[?] Information-Theoretical Measure xx?
No.
The pattern above shows why this measure is an important one: although it appears theoretically richer than tau-b (because of the former's grounding in information theory), it fails to satisfy the new causal criterion.
E. Chi-squared-based Measures: xx, xx, xx, etc.
None of these satisfy all the criteria. Based on chi-squared, they all satisfy the "statistical independence" and "meaning of 0" criteria, but they all fail one or more of the other tests.
V. ORDINAL VARIABLES (O-->O)
A. Gamma [lambda?]?
Satisfies all the criteria.
B. Other Ordinal Measures?
What about a Guttman-type measure which uses a monotonically increasing (or decreasing) line as a prediction line? (Actually, I think that winds up being the same as gamma [lambda?].)
VI. RELATIONSHIP OF THE ABOVE TO PROPORTIONAL-REDUCTION-OF-ERROR (PRE) MEASURES
Note that all the above are PRE measures (Costner 1965). PRE measures automatically satisfy a number of the criteria.
VII. OTHER POSSIBLE COEFFICIENTS
Nominal-Ordinal (Make sure that if N1 --f--> N2 and N2 --g--> O1 and N1 --h--> O1 and h=g*f, then rh=rf x rg. Also make sure that if N1 --f--> O1 and O1 --g--> O2 and N1 --h--> O2 and h=g*f, then rh=rf x rg.)
Nominal-Interval
Ordinal-Interval
Interval-Ordinal
Interval-Nominal
Ordinal-Nominal
VIII. THE MEANING AND MEASUREMENT OF PARTIAL ASSOCIATION (Mxy.z) FOR ALL POSSIBLE COMBINATIONS OF LEVELS OF MEASUREMENT, AND THE PRINCIPAL COMPONENTS ANALYSIS DERIVING FROM EACH OF THESE
A. Mnn.n
xx
What does it mean to hold a nominal variable constant?
What does it mean to have two nominal variables affecting a third nominal variable?
B. Mnn.o
xx
C. Mnn.i
xx
D. Mno.n
xx
What does it mean to hold a nominal variable constant?
E. Mno.o
xx
F. Mno.i
xx
G. Mni.n
xx
What does it mean to hold a nominal variable constant?
H. Mni.o
xx
I. Mni.i
xx
J. Mon.n
xx
What does it mean to hold a nominal variable constant?
K. Mon.o
xx
L. Mon.i
xx
M. Moo.n
xx
What does it mean to hold a nominal variable constant?
N. Moo.o
xx
O. Moo.i
xx
P. Moi.n
xx
What does it mean to hold a nominal variable constant?
Q. Moi.o
xx
R. Moi.i
xx
S. Min.n
xx
What does it mean to hold a nominal variable constant?
T. Min.o
xx
U. Min.i
xx
V. Mio.n
xx
What does it mean to hold a nominal variable constant?
W. Mio.o
xx
X. Mio.i
xx
Y. Mii.n
We compute R²ii for each separate value of the nominal variable Z, weighting by the fraction of cases in that value of Z. (A shorter way would be to compute total error before (for each value of Z separately, summing over all values of Z), then total error after knowing X (again, for each value of Z separately, again summing over all values of Z). [xx Make sure that the two methods are indeed the same.]
What does it mean to hold a nominal variable constant?
Z. Mii.o
xx
AA. Mii.i
The classic partial correlation coefficient.
1. The Consequent Factor Analysis
[Here a description of the principal components analysis that follows from Pearson's R2, being sure to point out the dependence of the process on the causal criterion. And of course cite Alker, from whom this is lifted.]
IX. CONCLUSION
xx
APPENDIX: CANDIDATE MEASURES OF ASSOCIATION
I. N-->N
Chi-squared
Goodman & Kruskal's tau-a, tau-b, tau-c
Gamma [lambda?]
Information-theoretical measure
II. O-->O
Lambda [gamma?]
I did a sample calculation for this and found the causal criterion to be satisfied in that case. Sample:
XY: (M = 5/12)
| 1 | 3 |
| 2 | 1 |
YZ: (M = 1/10)
| 1 | 1 |
| 3 | 2 |
-->XZ: (M = 5/120)
| 11/12 | 11/12 |
| 25/12 | 35/12 |
Mention Guttman's article in Sankya that defines M as follows: Assign each value of X and Y a value such that the Pearson's correlation between them is maximized; that correlation is the value of the O-->O association. However, I am quite sure that this measure doesn't satisfy the causal criterion, since if X-->Y-->Z (where all are ordinal), then the values assigned to X for the purpose of the X-->Y link may not be the same as for the X-->Z link. Ditto for Z. Ditto for Y in its two different appearances. [xx But I should check this to make sure.]
III. I-->I
Pearson's r-squared, possibly with nonlinear components. (Pearson's r-squared is also the information-theoretical measure.)
IV. N-->O
Use a PRE measure where one is trying to predict the rank order of Y1 and Y2. Error before = # of untied pairs / 2.(3) I can now see two possibilities for error after, depending on our prediction method once we know X:
(1) Predict pair ordering as follows: for each combination of two X values, determine which ordering of Y values occurs more frequently, and predict that.
(2) Determine the ordering of values of X best matching the ordering of values on Y, and predict the ordering accordingly.
Clearly option (2) results in lower measures than option (1). Option (1) respects the nominal nature of X, since it does not assume that X has some underlying ordinal structure that is then reflected in its association with Y. Option (1) is also more faithful to our sense of what statistical independence means in such cases, namely, that for each pair of values of X, the number of cases for which y1>y2 is equal to the number of cases in which y1<y2, where y1 runs over all cases having the first value on X and y2 runs over all cases having the second value on X.
Let's look at what criteria these measures satisfy. Since they are both PRE measures, they satisfy the range, statistical independence, and meaning of 1 criteria. What about the other criteria?
* Meaning of 0: This is satisfied for (1), given our sense of what "statistical independence" means in the N-->O case. However, I don't believe this criterion is satisfied for (2), since there may be a situation in which there is no overall ordering of the X values that improves our prediction, but in which the individual pairs of X values yield better predictions of Y. [xx I will have to see if I can produce a case showing this.]
* Perfect Association: My sense of perfect association in the N-->O case is that each value of X is associated with one value of Y.(4) If this is the case, then both (1) and (2) would take values of 1 in the case of perfect association.
* Causal: We have to examine two different cases: N1-->N2-->O, and N-->O1-->O2.(5) xx
V. N-->I
Eta. (This is also the information-theoretical measure.)
VI. O-->N
Guttman-type measures dividing the ordinal scale into blocks associated with particular nominal values
VII. O-->I
xx
This is a tough one. One of the ways to do it is to try to compute error before as the total over all pairs of cases of the square of the difference between the cases on Y minus the predicted difference between the cases on Y (which latter would always be zero). For error after, predict deviation between Y and Y' equal to the average such deviation when X > X'? This just doesn't seem like a very good measure. It can't ever attain 1.0, even for perfect association. Perhaps we should predict a separate deviation for each value of (X, X')? But this seems to give us too many degrees of freedom.
Perhaps the answer is to use Goodman & Kruskal's [Guttman's?] method of fitting a monotonically increasing (or decreasing) line to the distribution. This would certainly respect the ordinal nature of X. xx Things to check out: I'm pretty sure this satisfies the perfect association criterion; does it? Would this satisfy the causal criterion? Also, if O-->I1-->I2, then how would the monotonic line of the O-->I1 link be related to that of the O-->I2 compound link? And similarly for the case O1-->O2-->I.
Perhaps we should use the method (like that discussed by Guttman in Sankya for the O-->O case) of assigning the interval-level values for X that maximize the correlation between X and Y? No: that wouldn't work when we made this only one link in a causal series of links, because the ordinal variable could be given one set of interval values for purpose of one link and another set of interval values for another link.
VIII. I-->N
Discriminant analysis? (only when N is a dichotomous variable).
Guttman-type measures (same as for O-->N, see above, except that if they really are the same, we would seem to be wasting information.
IX. I-->O
xx
BIBLIOGRAPHY
1. E.g., Pearson's r-squared for interval variables and many chi-squared-based measures for nominal variables.
2. I first became aware of this fact through Hayward Alker's (1969:xx-xx) explanation that principal components analysis depended on this property of Pearson's r2.
3. "Untied pairs" means all pairs not having tied Y values. They may have tied X values.
4. If we counted pairs tied on Y, then perfect association would mean that each value of X is associated with a unique range of values of Y. [xx I'm not sure now exactly what this means.]
5. Our analysis of these cases will also depend on what measures we use for the N1-->N2 and O1-->O2 links. I'll assume that the N-->N and O-->O measures discussed above (Section xx and xx above) are satisfactory. If we later discover that a multiplicity of these measures is satisfactory, then we will have to reexamine this case.
The University of Minnesota is an equal opportunity educator and employer.
Copyright © 1999 Regents of the University of Minnesota. All rights reserved.