POL 2700:  METHODOLOGY AND ANALYSIS

Notes on Causation



1. AN ANALYSIS OF THE ROLE DETERMINISM PLAYS IN THE STUDY OF CAUSALITY

[much of this suggested by or even taken from Babbie, Chapter 3]

We won’t cover everything about determinism. Just a few clarifying statements.

* Determinism: the doctrine that human behavior is completely determined by causes that exist independent of the person’s will. To put this another way, free will is an illusion.

* We are trying to explain differences among human beings in their characteristics and behavior, i.e., why they vary from one another. We say in social science that we are trying to explain “variance”.

* In social science, we accept probabilistic explanations:

• “Given X cause, people tend to do Y.” Thus:

• Given the fact that these people are women (as opposed to being men), they tend to vote Democratic (more than men).

We don’t even have to get into the issue of free will vs. determinism in order to do such studies or reach such conclusions.

* Thus, contra Babbie, we don’t need to assume determinism in order to look for (and find!) causes (notice I don’t say “the” causes) of behavior.

* After taking such causes into account, we find that a certain amount of variance has been “explained”, but that there is a “residual” or “unexplained” variance.

* This residual variance (or any part of it) could be free will (and thus unexplainable) or it could be explainable. We can’t know in advance, so we keep looking for causes.

* Our concern about determinism arises in part from the so-called “reductionist fallacy”: that finding causes for some behavior or characteristic or belief not only explains that behavior but also explains away the behavior, so that issues of it being “right” or “true” is no longer relevant.

• We might be able to give some Freudian explanation of Einstein’s belief in the theory of relativity in terms of his toilet training, but this doesn’t mean that there is no validity to the theory or that the truth of the theory wasn’t a cause.

• We might be able to explain our refusal to kill people in terms of police surveillance, but this doesn’t mean that the rightness of this refusal wasn’t a cause.

2. INDEPENDENT AND DEPENDENT VARIABLES

In explanations, we are trying to explain different units' values on one variable by referring to their value on another variable. Thus if we are trying to explain how people voted in the 1992 Presidential election, then:

* What are the units of analysis? [People; those eligible to vote]

* What is the variable we are interested in explaining? ["Vote in the 1992 Presidential election]

* What are the values for this variable? ["Voted for Bush", "voted for Clinton", "voted for Perot", "voted for some third party candidate", "did not vote"]

* How might we explain people's choice? For example, I’m thinking of a certain person. [Sara] How would you explain (or predict) that person’s vote? If you had to predict and could ask that person any question except how s/he voted, what would you ask? [Point to another variable: age, race, sex, wealth, occupation, party i.d., etc.]

Note that you are explaining one variable as caused by another. The variable caused is called the dependent variable, because its value is thought to depend on the value of the other variable for each case. The other variable is called the independent variable.

It is YOUR CHOICE which variable you want to be dependent. We don't REQUIRE causality, but if causality is present, you'll look pretty silly saying that the effect causes the cause.

Note that independent and dependent variables can reverse: although my hypothesis above was that my party i.d. caused my vote, we could argue that vote causes party i.d. (as I shift loyalties).

It can be unclear which is INHERENTLY the variable being caused. For example: "In October, researchers at Auburn University and Wayne State University, surveying 49 metropolitan areas' prevalence of country and western music on radio, found that the more C&W, the higher the suicide rate." Footnote It's not clear which came first; one could make a case for either causing the other. (This is where a controlled experiment would help.)

[CLASS EXERCISE: MAKE UP THREE DIFFERENT THEORIES FOR THE CONNECTION, THE FIRST ARGUING THAT C&W MUSIC CAUSES A HIGH SUICIDE RATE, THE SECOND THAT A HIGH SUICIDE RATE CAUSES RADIO STATIONS TO PLAY COUNTRY AND WESTERN MUSIC, AND THE THIRD THAT SOME OTHER FACTOR {e.g., working class oppression} CAUSES BOTH C&W AND SUICIDE.]

[This is not too important:] Note that a dependent variable can be independent in another explanation. (For example, my vote is one independent variable in an explanation of who wins the election.) An independent variable can itself be explained, in which case it will be the dependent variable. (For example, my party i.d. is caused by my parents' party i.d.)

3. CAUSALITY AND ITS CRITERIA

David Hume's three criteria for determining causality:

a. Effects Succeed Causes; Causes Precede Effects

In any causal claim, that which is caused (i.e., the dependent variable) must in fact have come later than that which is claimed to have caused it (i.e., the independent variable).

Note that we must phrase the hypothesis clearly (in particular, specifying when the cause and effect occur) to see if the time order criterion is satisfied. “Taking medicine causes illness.”

WRONG ANSWERS: A number of answers gave examples in which they argued that the cause should have come before the effect, or usually comes before the effect, etc. This is all irrelevant. The only issue is, did the claimed cause IN FACT come before the effect. Note that the above answer doesn't bother arguing that rain is needed to get ice, or usually comes before ice, or whatever; it just says that in this instance the rain did in fact come after the ice and so could not be its cause.

      A frequent, wrong example: "Getting a good grade on a test can't cause studying, because you need to study before a test." What you NEED isn't the issue of this criterion; the only issue is, DID the studying that you are claiming as a cause occur before or after the test? (After all, you might have studied after the test!) If the latter, the criterion is violated.

Here's an example where it isn't clear about the causality: "According to a recent study by University of California at Irvine researchers, violent criminals have five times as much of the metal manganese in their hair as do law-abiding citizens. The researchers have no explanation but seem confident that the metal is a symptom rather than a cause of the violent behavior." Footnote It's not clear which came first; one could make a case for either causing the other. (This is where a controlled experiment would help.) [CLASS EXERCISE: MAKE UP TWO OPPOSING THEORIES FOR THE CONNECTION, ONE ARGUING THAT MANGANESE IN THE HAIR CAUSES VIOLENT CRIME, THE OTHER THAT VIOLENT CRIME CAUSES MANGANESE IN THE HAIR.]

Here's an example where the causal direction isn't clear: "As people age, their pineal gland produces less melatonin." Does aging cause less melatonin? Does the absence of melatonin cause aging? (I heard this example on NPR's "Talk of the Nation" on 8/18/95. This raises the question, can giving people extra melatonin keep them from aging?)

b. Empirical Correlation (and perhaps "Plausibility")

Knowing the value of the independent variable must IN FACT, WHEN YOU MEASURE BOTH VARIABLES IN REAL CASES, help you predict the value of the dependent variable. (Note connection to PRE measures.)

If the claim is that the variable X causes the variable Y, the criterion of empirical association simply requires that X and Y "covary", i.e., that when X is large, Y tends to be large also, and when X is small, Y tends to be small also. For example, the higher our team's score at a basketball game, the louder our fans tend to cheer. Footnote

Babbie claims that a connection must be "plausible". This is related to physics's idea of "no action at a distance"; we need some MECHANISM for the relationship for us to put any faith in it.

      For example, it is factually ("empirically") true that Presidents elected in years ending in 0 are more likely to die in office than those not elected in years ending in 0. (See table below.) Still, there doesn't seem to be any plausible way that being thus elected could cause dying in office, and so we reject the idea of a causal connection.

 

Elected in year ending in 0

Elected in no year ending in 0

Did not hold office based on an election

Died in office

Kennedy, FDR, Harding, McKinley, Garfield, Lincoln, Harrison                     7

Taylor

1

[no one]

0

Didn't die in office

Reagan, Monroe, Jefferson




3

Bush, Carter, Nixon, Johnson, Eisenhower, Truman, Hoover, Coolidge, Wilson, Taft, TR, etc.                        11

Ford




1

EXAMPLES: "When the sun rises, it gets light." "Children tend to share their parents' party i.d." [See also the two examples mentioned in the Time Order section.]

Note that THE CORRELATION NEED NOT BE PERFECT; a single counter-example doesn't violate the criterion; remember that social scientists are usually concerned with probabilistic explanation: "The older people get, the more conservative they become." (Of course, if the hypothesis is that X ALWAYS, INVARIABLY causes Y, then that's another situation.)

Note also that A NEGATIVE ASSOCIATION IS STILL AN ASSOCIATION.

Take the book's bad example: "90% of AIDS cases occur in communities with fluoridated water." If 90% of all people live in communities with fluoridated water, then there's no association.

      Consider the following 2 x 2 table:

Fluoridation?

AIDS

Lives in fluoridated community

Lives in unfluoridated community

TOTAL

Has AIDS

9

1

10

Doesn't Have AIDS

891

99

990

TOTAL

900

100

1000

But note also that the correlation need not be exactly nonzero: with a small sample size, considerable fluctuations will arise simply from sampling variability. If I call the flip of a coin correctly once and Roger calls it incorrectly once, then there is a perfect correlation but no special reason to think this justifies the statement, "Chilton makes perfect predictions and Roger makes all errors."

c. Non-spuriousness (vs. Specification): The Association/Correlation Cannot Be Explained Away by a Third Variable

SPURIOUSNESS:

An empirically existing relationship between the variables X and Y is said to be spurious if a third variable Z can be found such that after controlling for Z (i.e., holding Z constant), the relationship between X and Y disappears, changes sign, or otherwise changes its character. (This is often called, "Using Z to explain away the relationship between X and Y.") The criterion of nonspuriousness means that no such variable can be found. When no data on Z is available, the relationship between X and Y is said to be spurious if it is plausible that controlling for Z would change the character of the relationship between X and Y.

The third variable can't just be hypothesized; we have to measure it and show it doesn't eliminate the association. But advancing a plausible alternative explanation does cast doubt on the research; we suspend our conclusions until we can eliminate that plausible alternative.

Here is one example of a spurious association:

      A former student, Butch Johnson, tells me that in one state, an insurance company charged higher rates for cars painted red, because the company’s experience was that red cars were more likely to get into accidents than cars of other colors.

It turned out, of course, that this was a spurious correlation, since the problem was with sports cars, which were commonly painted red and which often got into accidents.

ACCIDENT HISTORY: ALL CARS

 

Had an accident in last year

Had no accidents in last year

TOTAL

Red car

85              (6.07%)

1315

1400

Other color car

465            (5.41%)

8135

8600

TOTAL

550            (5.50%)

9450

10,000

Note that while the accident rate for all cars is 550/10,000 = 5.5%, the accident rate for red cars is 85/1400 = 6.07% and the accident rate for other-colored cars is 465/8600 = 5.41%. The accident rate ratio (6.07/5.41) = 1.123, meaning red cars have a +12.3% higher accident rate.

Now we claim that the association is spurious, explained away by the sports car/non-sports car distinction. To show this, we distinguish between the two types of cars.

ACCIDENT HISTORY: SPORTS CARS ONLY

 

Accidents in last year

No accidents in last year

TOTAL

Red car

45              (9.00%)

455

500

Other color car

55            (11.00%)

445

500

TOTAL

100          (10.00%)

900

1000

Note that while the accident rate for all sports cars is 100/1000 = 10.00%, the accident rate for red sports cars is 45/500 = 9.00%. and the accident rate for other colored sports cars is 55/500 = 11.00%. The accident rate ratio (9.00/11.00) = .818, meaning red sports cars have an 18.2% lower accident rate.

ACCIDENT HISTORY: NON-SPORTS CARS ONLY

 

Accidents in last year

No accidents in last year

TOTAL

Red car

40              (4.44%)

860

900

Other color car

410            (5.06%)

7690

8100

TOTAL

450            (5.00%)

8550

9000

Note that while the overall accident rate for non-sports cars is 450/9000 = 5.00%, the accident rate for red non-sports cars is 40/900 = 4.44% and the accident rate for other colors is 410/8100 = 5.06%. The accident rate ratio (4.44/5.06) = .878, meaning red non-sports cars have a 12.2% lower accident rate.

We also note that sports cars are 50% likely to be red, while other cars are only 10% likely to be red, so "being a sports car?" is correlated with "being red?". Sports cars have a 10% chance of having had an accident, while other cars have only a 5% chance, so "being a sports car?" is correlated with "having had an accident?".

Our conclusion, then, is that red is negatively associated with accidents (probably due to visibility), but this relationship is disguised by the fact that a large proportion (50%, in fact) of the accident-prone sports cars are red.

September 25, 1997

FREQUENT MISTAKES STUDENTS MAKE IN THINKING ABOUT SPURIOUSNESS

1.   “Different Reason, Same Relationship” This is just changing the reason why the relationship exists, without altering the relationship at all: Here’s an example of this mistake: Hypothesis: “The higher the age, the more likely to vote Republican, because their increasing age brings increasing wealth with it, which the Republicans are thought to protect.” Erroneous response: “This is a spurious relationship, because the real reason is that as people get older they feel more vulnerable to shifts in the economy and society, and Republicans are thought to be the party of stability.” Note that the person isn’t disputing the causal relationship, however. “Spuriousness” means the relationship doesn’t really exist, not that our reasons for its existence are wrong.

2.   “Denial of the Empirical Relationship” Hypothesis: “The higher the age, the higher the income”. Erroneous response: “That’s a spurious association because when people get old and go on social security, their income declines.” This is just denying the facts that give rise to the original causal claim. We have already looked at whether our data support the second (“empirical association”) criterion.

3.   “Denial of Time Order” Hypothesis: “Political participation causes increased voting rates.” Erroneous response: “That’s a spurious association because people start by voting and only later participate more widely in politics.” This isn’t spuriousness but a claim that the hypothesis violates the first (“time order”) criterion.

4.   “Additional / Alternative Causes” This just says there’s more than one cause of the dependent variable, without denying the existence of the original relationship. But we're NOT concerned with there being more than one cause of the dependent variable. We're looking for an explanation of the association, NOT additional causes of the dependent variable. Here’s an example: Hypothesis: “Political participation causes increased voting rates.” Erroneous response: “That’s a spurious association because there are lots of other things that could cause a certain turnout rate: the weather, who is running for office, etc.”

I think this error arises from students’ confusion between idiographic and nomothetic explanations. It may be true that for this single election the (low, say) turnout rate was caused by there being a hurricane. But we can find unique explanations for any particular election; what we are looking at is what factors cause voter turnout for elections taken in general.

5.   “Specification” Specification is finding an INTERMEDIATE variable Y such that when it is controlled, the original relationship between X and Z is changed. For example, there is a strong association between being liberal (X) and voting for the Democratic candidate (Z), so we have a causal model X➔Z. We introduce a third variable, "evaluation of the Democratic candidate's platform" (Y), with our new causal model being X➔Y➔Z. When we control for the evaluation, we find that there is very little DIRECT association left. We have thus SPECIFIED the mechanism by which the causal link occurs: liberals evaluate the platform highly and then make their voting choice based on that evaluation. We don't think of that as spuriousness but as specification.

However, specification can be a form of spuriousness if the mechanism implied by the third variable changes our interpretation of the causal link. (The book is not clear on this.) Consider, for example: A study explored whether sweets caused uncontrollable behavior in children. Children were divided into two groups. Children in the "treatment group" (i.e., treated with sugar) ate whatever they wanted, and parents were instructed to say nothing to them about their diet. Children in the "control group" were given a diet that had no sugar. (This was done in consultation with the parents.) The treatment group turned out to be less controllable than the control group. Later research called this into question, though, and it was eventually found that it was parental attention, not sugar, which was responsible for the controllability. Thus sugar (X) led to parental attention (Y), which led to controllability (Z).

Here's another example: Hypothesis: “A dog’s fierceness is caused in part by its breed. For example, pit bulls and Doberman Pinschers are more fierce than other breeds.” Erroneous response: “This is a spurious relationship because if we control for attack school enrollment, the relationship between breed and fierceness changes/disappears.” But this is an erroneous response because the original causal relationship is still admitted, even if now we have specified a somewhat more indirect reason than genetics for this connection.

Another example is the causal claim that being African-American (X) causes one to be able to jump high (Z) because blacks have different genetic codes than whites. (Some coach said that and got blasted by Sports Illustrated.) What we challenge here is any DIRECT connection between race and jumping. We look at the intermediate factor of "taking sports seriously" (Y), and if we hold that constant, it is plausible that the DIRECT relationship between race and jumping would disappear. In other words, we find that the mechanism is not genetic but sociological. The causal connection remains, but the original causal mechanism is shown to be spurious.

6.   “Explaining the cause” This is finding a variable that causes X, the cause in the original hypothesis. Here is an example of this mistake: Hypothesis: “Having handgun control laws causes a lower rate of handgun murders.” Erroneous response: “This relationship is spurious, because having handgun control laws is really just a result of the liberalness of the state.” Here, X = “whether the state has a handgun control law”, Y = “rate of handgun murders in the state”, and Z = “liberalness of the state”. The answer says that instead of X—>Y, the real picture is Z—>X—>Y. But this does not deny the original contention that X causes Y!

OTHER EXAMPLES OF SPURIOUS CORRELATIONS

* Ice cream sales are associated with numbers of rapes. This relationship disappears when weather is held constant. Alternative explanation: hot weather brings more people out, including both rapists and targets, and it also induces people to buy more ice cream.

* The Westinghouse Evaluation of Head Start showed that Head Start attendance has a negative association with performance in school. Do you believe this? Alternative explanation: the relationship is confounded with the prior relationships with ability / poor home environment / wealth / class.

* "As education increases, so does Republican party i.d. Therefore education causes people to become Republicans." Do you believe this? Alternative explanation: both education and Republican party i.d. are dependent on wealth.

September 20, 1997

ARE RED CARS A BIGGER INSURANCE RISK THAN OTHER COLORS?

ACCIDENT HISTORY: A SURVEY OF 10,000 CARS

 

Had an accident in last year

Had no accidents in last year

TOTAL

Red car

85              (6.07%)

1315

1400

Other color car

465            (5.41%)

8135

8600

TOTAL

550            (5.50%)

9450

10,000

The accident rate for all cars is 550/10,000 = 5.5%. The accident rate for red cars is 85/1400 = 6.07%. The accident rate for other-colored cars is 465/8600 = 5.41%.

ACCIDENT HISTORY: THE SURVEY'S 1,000 SPORTS CARS

 

Accidents in last year

No accidents in last year

TOTAL

Red car

45              (9.00%)

455

500

Other color car

55            (11.00%)

445

500

TOTAL

100          (10.00%)

900

1000

The accident rate for all sports cars is 100/1000 = 10.00%. The accident rate for red sports cars is 45/500 = 9.00%. The accident rate for other colored sports cars is 55/500 = 11.00%.

ACCIDENT HISTORY: THE SURVEY'S 9,000 NON-SPORTS CARS

 

Accidents in last year

No accidents in last year

TOTAL

Red car

40              (4.44%)

860

900

Other color car

410            (5.06%)

7690

8100

TOTAL

450            (5.00%)

8550

9000

The overall accident rate for non-sports cars is 450/9000 = 5.00%. The accident rate for red non-sports cars is 40/900 = 4.44%. The accident rate for other colors is 410/8100 = 5.06%.

September 19, 1997

WORKSHEET ON SPURIOUS CORRELATIONS

Here are a number of supposed examples of spuriousness, taken from papers given by previous students. Judge which of them are correct and which incorrect, and if incorrect, explain why they are wrong.

1. Nonspuriousness is when you can prove that one thing hasn't falsely led you to believe another. For example, the link between medicine and illness. You take medicine when you're ill rather than taking medicine MAKES you ill--that is spurious.

2. If you were doing an experiment that hypothesized people would be less apt to support testing on animals if they saw a graphic film on such testing being done, as opposed to reading written information, this has problems with spuriousness. If a person had pets, was an animal lover, or saw an animal on the film that looked just like the one they had, this would be spurious, because they would perhaps not at all pay attention to the animal testing and simply pay attention to the animals and their resemblance.

3. Having high insurance rates means that you are a bad driver. Or there could be a 3rd variable such as age influencing the insurance rates. Being young (Z) can also cause higher rates. Being a bad driver cannot be solely accounted for & this must be taken into consideration.

4. The Washington Post says religious people are gullible & easily manipulated. This is spurious, because several third causes--individualism, anti-intellectualism, & revivalism have helped bring both about in our day.

5. Hypothesis that is spurious: "Black people vote less often than whites." This is spurious because it implies that there is something inherent in "being black" that simply causes people not to vote--it does not take into factors such as how SES affects voting habits.

6. If you were asking people what their feelings were on medicaid & medicare--and failed to take into consideration their age. Age plays a big part in their feelings, and when you do not include it you violate nonspuriousness.

7. A relationship that would violate the spuriousness criterion would be the following: "The reason that 60,000 people go to the Metrodome is because it's Sunday." Of course, the reason people go to the Metrodome on Sundays is because the Vikes play on Sundays, and people go to watch them play, not just because of the day of the week.

8. Another example of spuriousness:   I (a student) fail this class. Did I fail it because I didn't understand the book or was it because I had a heart transplant and wasn't "into" the class?  The third factor, heart transplant, aided in me failing.

9. Expensive cars create fewer accidents.   There could be another factor, such as fewer expensive cars on the road & that they are driven less frequently.  If there is a controlling factor Z, the relationship is spurious.

10. A researcher hypothesized that using a hairpick causes one to have an Afro....   Racial characteristics or having a permanent caused one to have an Afro, not owning a hairpick & thus the empirical relationship was spurious.

11. Going back to the cancer example [SPC: about a treatment for cancer]. Say there is an empirical relationship between the treatment and being cured of cancer. Say it is positive. Perhaps a third variable such as genetics or environment the person lives in is causing the empirical relationship. Then the results would be spurious.

12. Other factors that influence a person's stance on gays in the military could be whether or not he/she is in the military themselves or how they stand on other issues dealing with gays or just general liberal / conservative biases. We cannot rule out spuriousness.

13. "Canes are used most often by old people because they work best." An experiment showed this. This does pass the time order criterion and it passes the empirical relationship criterion, but it doesn't pass nonspuriousness. They don't control for a 3rd variable which can be cost. "When elderly can't walk w/out aid, a cane is the cheapest thing they can buy."

14. Nonspuriousness is when we control for third factors and are able to prove that X does cause Y instead of some other factor. An example here that would violate nonspuriousness, could be drawn from the one above. We hypothesize that giving a person information on democratic candidates would cause them to vote democrat. There could be other third factors involved such as "ideology" or "party identification". These third factors would cause a spurious relationship if we didn't control for them ahead of time.

15. When more people buy ice cream, car thefts increase.... When it is warmer, more people buy ice cream & more people [tend?] to steal cars (windows left down etc...)

16. Suppose we want to compare socioeconomic status with religiosity.... Say that the correlation is .5. Is it the socioeconomic status ($) or is it something else. It could be that rich parents cause both X & Y. Maybe having rich parents causes your chances of being well off to increase and also they may influence religious beliefs. There is a relationship in the real world for parents' religiosity and children's religiosity. In order to prove the original hypothesis, these 3rd variables must be considered. If they are proven to have an effect on causation then the original hypothesis must be discarded.

17. An example that violates the nonspuriousness rule would be: "People with high auto insurance payments tend to vote Democratic." This would violate the nonspuriousness rule if in reality a person's young age were causing them to have high payments and if young people tended to be more liberal and hence, vote Democratic.

18. Nonspuriousness could arise from the causal hypothesis that the temperature in a given area affects murder w/ guns. --i.e. the hotter the state the more murders occur w/ guns--people are more agitated. Well a "Z" variable could be that these hotter states have more lax gun laws i.e. concealed weapons laws easier--easier to own a gun, more people own guns so more murders will be w/ guns? --third variable could be real caused not heat.

19. If we want to say that the experience of living abroad causes people to be more active, we have to make sure there's no third criterion which can be a reason of both of them. In this case, I can say maybe the people who go overseas are active from the first, so it violates the criterion.

QUESTIONS AND ANSWERS FROM THE EXAMS

[5 mins; 20 pts] Explain/define the "empirical correlation" criterion for proving that a relationship between two variables is causal or not. Give an example (other than one from the class, review session, or texts) of a relationship that violates that criterion, explaining how it does so.

If the claim is that variable X causes variable Y, the criterion of empirical association simply requires that X and Y "co-vary", a methodologist’s term meaning that when X is large, Y tends to be large also, and when X is small, Y tends to be small also. For example, it is a fact that the higher our team's score at a basketball game, the louder our fans tend to cheer.

Of course, all the above is for a positive correlation. For a negative association, large values of X tend to go with small values of Y, and vice versa. For example, the higher the opposing team's score in a basketball game, the less loud our fans cheer. EITHER ONE--either positive or negative--is an association, however.

A relationship that violates this criterion is simply one where there is, empirically, neither a positive nor a negative association between the two; in other words, there is a zero association. For example, if we actually take daily measurements of my hair, we find (let's say) that there is no relationship between the amount of gray in it and whether it is Tuesday: we see the same amount of gray on Tuesdays as any other day.

I thought this question would be twenty free points, but as it turned out, few people got it right. Here are some common errors:

>> People thought that one exception to a relationship disproves the relationship. Not so! Even if one sports car owner doesn't have any accidents, there is still an (overall) relationship between owning a sports car and having more frequent accidents.

>> People confused non-correlations with negative correlations.

QUESTIONS AND ANSWERS FROM THE ONE-MINUTE ESSAYS

* The graphs [tables, actually] on the board were difficult to understand and to follow. I think we should go over this again on Wednesday.

I know it is a little confusing now. I will be setting up a study session outside of class where we can discuss these issues.

I also recommend that you read them over outside of class and with a fellow student.

* I learned about causation, but big whoop--what do we use causation for?

How about being able to deal with an insurance company that is charging you twice the standard rate because your station wagon is red? How about dealing with the City Counselors who try to cut the fire department on the grounds that fire fighters cause fire damage?

If these points don't convince you, I'll be discussing (in the study session mentioned above) an actual program (Head Start) whose funding rose and fell based on the problem of figuring out whether Head Start actually caused any improvements in children.

* [In saying what s/he had learned one student made the following statement:] I learned that there can be a third factor that can disprove a seemingly empirical correlation.

I'm sure this student understands what s/he means, but I want to be very careful in the language. Spuriousness does not disprove an empirical correlation. The correlation exists, it's based on the data, and it isn't subject to proof or disproof. What IS disproved is the CLAIM that this empirical correlation between two variables (a factual, justified claim) comes from a causal connection between them (the theoretical, unjustified, falsified claim).

* Shouldn't we look at all colors separately rather than red sports cars vs. all other sports cars because individually the red sports cars could be involved in more accidents?

I'm not sure what you mean by the final clause (after "because"), but the first part of the question is a good one. Indeed, if our interest was in studying the effect of colors upon accidents, then we would do the analysis with each color distinguished, as you suggest. However, our interest is in analyzing the insurance company's claim that red, as opposed to any other color, causes accidents. In other words, it's the nature of the insurance company's claim that determines the form of our analysis, not a general, theoretical interest in color.

* You are not considering all of the other colors of sports cars it is safer to have a color that is not as popular as red. The only way other would work is if you had a conglomeration of other colors!

I'm afraid I don't understand this. Could the answer to the above question apply?

* What does spuriousness actually mean? I am unclear about spuriousness vs. nonspuriousness.

An association/relationship is said to be "spurious" if the connection between the two variables can be explained by the causal connection of a third variable to each of the first two. The association/relationship is said to be "non-spurious" if no such third variable can be found.

Sometimes a relationship is said to be spurious if the third variable simply reduces the size of the relationship. For example, suppose (as one student suggested) that red cars get into more accidents than non-red cars in part because their color makes their brake lights more difficult to see. In that case, the original, strong correlation between red color and accident rate would still be reduced by the sports car/other car variable, but there would remain some residual, causal connection between redness and accident rate, even controlling for type of car. The original claim, then, would be said to be spurious in part.

* I learned how the non-spuriousness criterion works. The question is how.

I'm afraid I don't understand your question.

* Was the firefighter example spurious even though there was a (+) all around? I didn't quite understand that. What is actually non-spurious?

First, let's clarify the language: correlations/relationships are spurious or non-spurious; examples (as in, "the firefighter example") aren't.

The initial, positive relationship between "# of firefighters" and "amount of fire damage" was shown to be spurious, because when we controlled for the size of the fire (that is, holding the size of the fire constant), then "# of firefighters" became negatively correlated with "amount of fire damage". This final, negative association is probably a genuine causal connection.

* Is gender a spurious variable for the '92 vote? One of the injustices of our society is if a man and woman do the same job the woman is paid less. So if you adjust income levels, did women still tend to vote more for Bill Clinton?

First, let's clarify the language: variables aren't spurious or non-spurious; correlations are. So your question is really, "Is the empirical correlation between gender and the '92 vote spurious?"

Your subsequent explanation--that women vote for Clinton because they get paid less--is a reasonable one. (Of course there are issues besides economic ones which might lead women to vote Democratic.) But let's assume that your statement is perfectly correct. (I don't know if it is or not.) It still doesn't invalidate our conclusion that gender causes the vote; rather, it SPECIFIES exactly what the nature of that causal connection is: gender causes income disparity, which causes vote disparity. Read Chapter 16 in Babbie for a description of this sort of analysis.

* (a) Is the information you gave us on accidents right, and if so, (b) why charge more for red?

(a) I made up the data, but assume it's true. It's certainly plausible enough. (b) The insurance companies charged more for red because there was a higher accident rate for red cars and they assumed that correlation meant causality.

* What makes a variable relationship positive or negative?

If greater values of one variable are associated with greater values of another, the relationship is positive; if greater values of one variable are associated with lesser values of another, then it's negative. If greater values of one variable don't make any difference one way or the other to the values of the other variable, then there is a zero relationship.

In the particular example of the sports cars, a positive association meant that being red was associated with a greater (higher) accident rate.

* (a) I still don't totally understand the last criterion. Can't all relationships be then disproved?

* (b) I would like an example of a non-spurious causal relationship. It seems we could always find some variable that would result in a spurious causality.

* (b, duplicated:) Is it possible to have an unexplainable [I assume you mean "non-spurious"] relationship?

(a) No. You have to FIND a third variable that causes the original association to wash out. If the connection is truly a causal one, you won't be able to find such a variable.

(b) Here's an example: "The application of fertilizer causes increased plant growth." Here's another: "Watching commercials for a product/candidate increases a consumer's/voter's likelihood of purchasing/voting for that product/candidate."

* If red sports cars have 9% accidents and 11% or [of?] all other accidents then red sports cars have almost as many accidents as all other sports cars combined?

I'm afraid I don't understand the question.

* I think there might actually be more accidents by red cars in real life, which voids your whole example.

(1) If you imagine I'm trying use made-up data to prove that red cars are less dangerous IN REAL LIFE, then you're in deep intellectual trouble. My examples are just to illustrate an intellectual concept.

(2) Anyway, the made-up data in my example shows that red cars DO have more accidents. The issue is whether there is a CAUSAL connection, not a mere association. Once again, if you don't understand the distinction between these two, you're going to have real trouble on the test.

* Nonspuriousness seems much clearer now. I know I made the mistakes you suspected on Exercise 3.3. Now I know that Z must explain BOTH X and Y.

Right! Good for you!

* What exactly is nonspuriousness? I still don't get it.

If a variable Z can be found that explains away the observed association between X and Y, then the causal relationship is said to be "spurious", i.e., false or misleading. Nonspuriousness is when no such variable Z can be found.

In my experience, students have a hard time with this concept. All I can suggest is that you review the book's discussion, look over your notes and my handout, and come to the review session.


URL: http://www.d.umn.edu/~schilton/2700/2700.Causality.html
Author:  Stephen Chilton [email]  |  Last Modified:  2003-11-21
Honor Roll  |  UMD  |  Pol Sci Department

The University of Minnesota is an equal opportunity educator and employer.
Copyright © 2003 Regents of the University of Minnesota. All rights reserved.