Sociology 2155: Outline--Week Eleven
I. Return exams.
II. Evaluation research. (Notice: one of our four purposes for research--description, exploration, explanation, evaluation--and not a particular method... could use surveys, experiment, or qualitative methods such as focus groups, depending on the purpose of the evaluation
A. We have to start by determining the goals(broad purposes, not directly measureable) and objectives (measurable indicators that goals are being reached) of the program.
B. Ultimately much of evaluation research involves trying to determine the effectiveness of programs, but there are some other types of evaluation as well, covered in more detail in your reading assignment.
1. Evaluation of need. E.g. Rachel Kincaid worked with members of the sociology department to determine the amount of teen homelessness in Duluth in the early 1990s
2. Evaluation of process:
a. How well was the program implemented. If it's a job training program being evaluated, for example, how many people showed up for the training sessions. What kind of training and background did the trainers have, and was it of the sort that the program's creators envisioned?
b. Who were the participants? If the program is designed to promote work readiness in a population that hasn't had much job experience, what if many of the participant have years and years of previous work experience?
3. Evaluation of impact or outcomes: this is what I'll concentrate on in today's lecture. To what extent does a particular program accomplish the goals it was designed to promote.
4. Evaluation of efficiency or cost-effectiveness
C. Design for evaluating outcomes: The closer we come to our model of experimental research, the better our evaluation. That is, we want a treatment group and a control group, or multiple treatment groups, with the treatment group(s) exposed to a carefully monitored intervention of some sort, under carefully controlled conditions that are the same for all groups. Ideally we'd like pre- and post-testing. IDEALLY WE ALSO WANT RANDOM ASSIGNMENT TO THE COMPARISON GROUPS.
Often we have to settle for a quasi-experimental treatment. For example, a panel study, where we get before and after measurements, but there's no control group. Or we have a control group, but people aren't randomly assigned. E.g., the Aronson et. al study of jigsaw groups in Austin: which teachers were willing to implement the new method? Their classes became the experimental group, and though they tried to match them with other classes taught by teachers of good reputation, there may have been systematic differences between the experimental and control groups.
D. "Truth and DARE: Tracking Drug Education to Graduation and as Symbolic Politics" by Earl Wyson, Richard Aniskiewicz and David Wright
1. Expensive national program: $700 million per year (as of 1993), with virtually no long term program evaluation... program delivered in the public schools by specially trained police officers (How many in class were part of a DARE program? )
2. Their evaluation from 1987-1992 in Kokomo, Indiana, with a focus on the five-year effects on high school seniors who had first been exposed to the program as seventh graders (graduating class of 1992). Comparison group consisted of people from previous year's class (graduating class of 1991)who had not been exposed and of students in class of 1992 who had moved in from places that didn't use DARE.
Survey of a random sample of seniors from both of those graduating classes, equally divided male and female, some who had been part of DARE and some who had not, asking about drug use and drug attitudes. Assured participants that survey would be both anonymous and confidential.
a. Goal of DARE: To shape attitudes and social skills against the use of harmful drugs and ultimately to reduce the use of harmful drugs by American young people
b. Measurement:
1) Attitudes: Used a DARE scale developed in the original DARE program in Los Angeles to measure attitudinal changes
2) Behaviors: Used a Drug Use scale, which looked at five different dimensions of drug use.... notice that they're depending on self-report and they're interested in: lifetime prevalence, recency of use, grade level at first use, frequency of use.
3) Also a small focus group of seniors, recollecting their DARE experiences and giving their judgments about its long-term effects... qualitative research: what purpose could it have in this context?
3. RESULTS: Ultimately both those who had been in DARE and those who had not tended to have the "right" answers on the Dare scale. BUT both groups showed relatively high rates of drug use, with no significant difference between the groups. (We'll talk about "significance testing next week.)
4. Symbolic politics of DARE. Why has DARE been so popular, and why has it been difficult to get nationally funded evaluation research? Why didn't it make any difference in Kokomo, where school officials didn't want to hear about their evaluation.
a. "Drug crisis" beginning in the mid-1980s; Reagan declared a war on drugs.... interesting concept that can be applied here: "moral panic"
b. "... expanding the DARE program offered public reassurance in the face of the socially constructed drug threat."
c. "Mutual support and reinforcement among direct stakeholders (staff, etc) and indirect stakeholders (politicians and the like).
W.I. THOMAS: "What people believe is real is real in its consequences." We live in a world of interpretation, and for better or worse, social science research is not one of the really major factors in what people believe.
5. When programs involve high hopes and big budgets, obligation to consider complicating factors. E.g., in the Kokomo study:
a. "National Drug Control Strategy" only after these Kokomo 7th graders were in DARE and therefore they may have missed some of the reinforcers of what they'd learned.
b. Kokomo study involved 7th graders, who were exposed to DARE two years later than recommended, and for eleven weeks rather than seventeen
Nevertheless, they believe their evaluation represents a fair and accurate assessment of the program's long-range effects.
E. The first carefully designed evaluation research in criminology: The Cambridge-Somerville Project: 1937-1944, near Boston... 650 boys, average age of 11, divided AT RANDOM into two groups... those in the treatment group received free health care, tutoring, summer camps, field trips, a recreational program, and individual counseling... program continued until boys were 18. Those in the control group received a little bit in the way of referrals, but the families and boys were mostly on their own.
What were the goals and objectives?
1. Independent variable: were they part of the delinquency prevention program or not?
2. Dependent variable: delinquency records--both whether they had been adjudicated, and for how much crime. (Is this the best approach to measuring delinquency? What about self-report? )
3. By the end of the program, 40% of boys in the control group had been adjudicated for a crime, and so had 40% of the boys in the treatment group... what's more, the total # of crimes by each group were closely similar
4. Staff members insisted they KNEW for certain that some boys' lives were turned around... what do you make of this?
Notice both the random assignment to treatment and control groups and the fact that the boys were tracked over an extended period of time. It would have been a plus to also track them in the years following the program's completion.
F . "Scared Straight:" the Rahway Lifer's Program.
1. The initial television presentation and the public reaction
2. Rutgers University Evaluation project: James Finkenauer (Again, what were the goals and objectives of the Rahway Lifer's Program?)
a. Evaluation of Process: What kinds of kids were most frequently taken through the Rahway program? How many serious offenders?
b. Identify experimental and control group of kids who had serious problems with the juvenile justice system, who did and didn't go through the Rahway program, and trace their subsequent criminal involvement over an extended period of time. Ideally you would randomly choose the ones going through Rahway and the ones that were not, but the Finkenauer team didn't have that degree of cooperation from the Rahway Lifer's program. (Why might organizations resist having their programs evaluated?)
c. Result: No difference in their subsequent criminal histories
3. Why were people so ready to believe in this program? How does it fit with what you've learned about crime and delinquency in terms of criminological theories?
IV. Working on survey reports
V. More Evaluation Research
A. . "Success for All?"
1. What is Success for All? "More than 1200 schools, mostly high poverty Title I schools, in 46 states are currently implementing the program with external assistance provided by the not-for-profit Success for All Foundation.
a. Students spend most of their day in traditional age-grouped classes, but are regrouped across grades for reading lessons (90 minutes a day, very scripted). Assessed and regrouped as necessary every 8 weeks. One on one tutoring for those who need additional help.
HOW WOULD YOU EVALUATE THIS PROGRAM? TPS
b. Developed by Robert Slavin and Nancy Madden at the request of the Baltimore School system; piloted in one elementary school, 1977-1988. Four more Baltimore schools implemented the program in 1988-1989.
c. Other elements include: emphasis on cooperative learning methods, Family Support Team to increase parents' participation, full-time Program Facilitator
d. Cost per school from $261,060 to $646,500 per year, depending on size of school
2. Evaluation research: Borman (U of Wisconsin, Madison) and Hewes (Johns Hopkins University). "The Long-Term Effects and Cost-Effectiveness of Success for All"
Press release on the Success for All Foundation Web Page: "Randomized Research Proves Success for All Raises Reading Achievement" "What sets this study apart is its use of rigorous evaluation methods common in large-scale medical research but rare in education. Such research referred to a 'randomized control trial,' assigns schools by the flip of the coin to either use a specific intervention or to serve as a control group."
a. Is it a randomized control trial? NO, absolutely not. The five original Baltimore elementary schools using the system are paired with other low income, high minority schools with similar student characteristics. The researchers rightly term this a "quasi expermental" research design. The Baltimore schools were particularly low-performing schools, and at least 80% of their teachers voted to support the use of Teach for All.
b. Measurement issues.
1) Experimental and control groups. All the students who were in each school for first are considered part of the sample (or rather those of these students whose eighth grade scores are available from the same school system). Four independent cohorts of first-grade students from 1987-1988 and the next three years, yielding a totla sample of 1388 Success for All and 1849 control students. Students who transferred out -- and there's usually a lot of transfers among low-income students-- do not get the full Success for All Intervention, but are included in the final results, which the researchers rightly see as conservative. The eighth-graders remaining in the Baltimore school system and with full data available: 581 Success for All students and 729 control group students
2). Reading (and math performance) are measured by standardized tests already used in these school system at the beginning of first grade (CAT) and during eighth grade(CTBS/4). The Success for All students were actually lower in their math and reading test scores than the students in the control groups schools, though not by a lot. The researchers see this as reassuring in relation to issues of "quasi" rather than full experimental design.
3) For cost analysis, the researchers used cost figures for each school from the Baltimore school system
c. The results
1) The Success for All students scored in the 20th percentile in reading in eighth grade and the 17th percentile in math. The control group students scored in the 14th and 15th percentiles respectively. Adjusting the scores to represent the lower beginning point in the Success for All schools, this represents a 6-month advantage in reading and a 3-month advantage in math.
2) Success for All had no more costs, because of the extra costs in the control schools of more students assigned to special education and more students held back a grade. Success for All students averaged 0.55 years in special education, compared with 0.82 in the control group. 91% of Success for All students avoided being held back a grade, and 77% of the control group.
d. Discussion: If test scores are the best predictors of educational success, and if test scores plus degrees the best predictors of occupational success and income, "Success for All" takes us only a small part of the way. In addition, an aspect that hasn't been examined at all is teacher retention under Success for All, nor how popular or effective it would be with middle class kids (if we anticipate more social class integration of the our schools).
The authors point to three other programs, the Perry Preschool, the Abecedarian Project, and the Tennessee class-size experiment, which have results in a similar range, based on well executed evaluation research. Abecedarian begins with small babies, which the Perry Program is for 3- and 4-year-olds., and the Tennesse experiment was aimed at kids in the early grades. No research on whether they might have an additive effect on poor and minority kids and what the net effect might be. Are we willing as a society to invest this kind of money in disadvantaged children?
Also, implementation issues. Can we expect the same high quality of implementation if these programs are implemented in large-scale public preschool and school programs? Head Start, for example, is informed by some aspects of Perry but not implemented as well in terms of class size, training of teachers, and the like and its long term effects are more dubious.
Researchers summary statement: "This study suggests that the replicable educational practices of prevention and early intervention, as modeled by Success for All, are more educationally effective, and equally expensive, relative to the traditional remedial practices of retention and special education."
B. Pre-school programs: MPR midday program , featuring Art Rolnick, senior vice-president at the Federal Reserve Bank in Minneapolis and Ron Haskins, senior fellow at the Brookings Institute.
C. Politics, culture, and "common sense"
1. Why is a careful evaluation research plan, including controls, NOT built into every major social policy initiative?
For example, April 14, 2006, issue of Today's Professor, "Increasing Access to College." Review of pre-college programs for "underrrepresented and underserved students," such as "GEAR UP and TRIO. "The data on their ultimate effectiveness in producing college graduates is limited. College going seems to increase but data on degree attainment seems lacking... His (Cliff Adelman's) startling obvious suggestion of a clinical trial where schools try some of the variety of programs and measure the results over time seems long overdue."
a. Sometimes research precedes social policy changes, as we might hope, but even then there's often a discrepancy between the needs and timelines of policy makers and researchers.
b. Remember the Sherman-Berk project with the Minneapolis police, with its flaws and complications, and their warning that police departments should wait for replication
c. The welfare reform pilot project in Minnesota from 1993-1997, and then the differences in actual implementation because of budgetary considerations... in Minnesota we knew how to create an effective program, but we didn't do it.
d. Sometimes at the same time a new policy is implemented, provisions are made for ongoing evaluation. E.G., Minnesota Family Improvement Program. 5-year research plan:
e. Sometimes research is applied after the implementation, often by outside researchers. Researchers may not get much cooperation from those implementing the program, who have a vested interest in its being perceived as successful. Such research is often ignored: e.g. DARE... although the Minneapolis school system has replaced the DARE program with something else (I wonder if it's been carefully evaluated?).
f. Maybe policy-makers don't want to know. Would rather be able to make grand claims--e.g., "Success for All " above
2. Why are political leaders, and often citizens, skeptical about the results of evaluation research? Why do they often feel that major new policies can be designed and implemented in the absence of systematic research?
a. "Common sense"
b. the impact of stories told by influential people ... social psychology research shows these are more influential than statistics
c. The impact of the mass media and of politicians
d. Vested interest... e.g. how many police jobs depend on the continuation of DARE? More generally, how many people's jobs are tied up with a particular program by the time we learn that it really doesn't work?