Assignment bias can threaten the internal validity of between-subjects experiments.

28 Chapter I The Logic of Experimental Design

Internal Validity. Statistical tests allow one to make conclusions about whether the mean of the dependent variable (typically referred to as variable Y) is the same in different treatment populations. If the statistical conclusion is that the means are different, one can then move to the question of what caused the difference, with one of the candidates being the independent variable (call it variable X) as it was implemented in the study. The issue of internal validity is, Is there a causal relationship between variable X and variable 7, regardless of what X and Y are theoretically supposed to represent? If variable X is a true independent variable and the statistical conclusion is valid, then internal validity is to a large extent assured (appropriate caveats follow). By a true independent variable, we mean one for which the experimenter can and does independently determine the level of the variable that each subject will experience�that is, assignment to conditions is carried out independently of any other characteristic of the subject or of other variables under investigation. Internal validity is, however, a serious issue in quasi-experimental designs where this condition is not met. Most commonly the problem is using intact groups of subjects. For example, in an educational psychology study, one might select the fifth-grade class in one school to receive an experimental curriculum and use the fifth-grade class from another school as a control group. Any differences observed on a common posttest might be attributed to preexisting differences between students in the two schools rather than your educational treatment. This threat to internal validity is termed selection bias because subjects were selected from different intact groups. Perhaps less obvious is the case where an attribute of the subjects is investigated as one of the factors in an experiment. Assume that depressed and nondepressed groups of subjects were formed by scores on an instrument like the Beck Depression Inventory; then, it is observed that the depressed group performs significantly worse on a memory task. One might like to claim that the difference in memory performance was due to the difference in level of depression; however, one encounters the same logical difficulty here as in the study with intact classrooms. Depressed subjects may differ from nondepressed subjects in many ways besides depression that are relevant to performance on the memory task.

Internal validity threats are typically thus "third" variable problems. Another variable besides X and Y may be responsible for either an apparent relationship or an apparent lack of a relationship between X and Y.

Other threats to internal validity include mortality (the problem that arises when possibly different types of people drop out of various conditions of a study) and a number of other issues that arise when subjects are assessed repeatedly over

Threats to the Validity of Inferences fr,om Experiments 29

time. This latter class of threats includes possible maturation of participants over time and "history," that is, events taking place between a pretest and posttest in addition to the treatment. Finally, other threats to the internal validity of a study occur when there is the possibility of communication during the course of a study among subjects from different treatment conditions. Thus, the mixture of effects of portions of different treatments that subjects functionally receive, filtered through their talkative friends, can be quite different from the single treatment they were nominally supposed to receive. This type of threat can be a particularly serious problem in long-term studies such as those comparing alternative treatment programs for clinical populations. For example, a waiting list control group may be demoralized by learning that others are receiving effective treatments while they are receiving nothing. Further, in a variety of other areas of psychology where studies tend to involve brief treatment interventions but where different people may participate over the course of an academic semester, the character of a treatment can be affected greatly by dissemination of information over time. Students who learn from previous participants the nature of the deception involved in the critical condition of a social psychology study may experience a considerably different condition than naive subjects would experience. These participants may well perform differently than those in other conditions, but the cause may have more to do with the possibly distorted information they received from their peers than the nominal treatment to which they were assigned.

Estimating the internal validity of a study is largely a thought problem in which you attempt to systematically think through the plausibility of various threats relevant to your situation.^ On occasion, one can anticipate a given threat and gather information in the course of a study relevant to it. For example, questionnaires or other attempts to measure the exact nature of the treatment and control conditions experienced by subjects may be useful in determining whether extraexperimental factors differentially affected subjects in different conditions. Similarly, in the case of subject mortality, measures of the characteristics of individuals dropping out can be analyzed in an attempt to assess the strength of the threat of a selection bias.

Finally, a term from Campbell (1986) is useful for distinguishing internal validity from the other types remaining to be considered. Campbell suggests it might be clearer to call internal validity "local molar (pragmatic, atheoretical) causal validity" (p. 69). Although a complex phrase, this focuses attention on points deserving of emphasis. The concern of internal validity is causal in that you are asking what was responsible for the change in the dependent variable. The view of causes is molar�that is, at the level of a treatment package, or viewing the treatment condition as a complex hodgepodge of all that went on in that part of the study�thus emphasizing that the question is not what the "active ingredient" of the treatment is. Rather, the concern is pragmatic, atheoretical�did the treatment for whatever reason cause a change, did it work? Finally, the concern is local: did it work here? In internal validity, one is not concerned with generalization.

Construct Validity. The issue regarding construct validity is. Given there is a valid causal relationship, is the interpretation of the constructs involved in that

30 Chapter I The Logic of Experimental Design

relationship correct? Construct validity pertains to both causes and effects. That is. the question for both the independent and dependent variables as implemented in the study is, Can I generalize from this one set of operations to a referent construct? What one investigator labels as construct A causing a change in construct C, another may interpret as an effect of construct B on construct C, or of construct A on construct D, or even of B on D. Showing a person photographs of a dying person may arouse what one investigator interprets as death anxiety and another interprets as compassion. Threats to construct validity are a pervasive and difficult problem in psychological research. We have implicitly addressed this issue earlier in this chapter in commenting on the meaning of theoretical terms. Since Cronbach and Meehi's (1955) seminal paper on construct validity in the area of assessment, something approaching a general consensus has been achieved that the specification of constructs in psychology is limited by the richness, generality, and precision of our theories. Given the current state of psychological theorizing, it is understandable why a minority continue to argue for strategies such as adopting a strict operationalism or attempting to avoid theorizing altogether. However, the potential for greater explanatory power offered by theoretical constructs places most investigators in the position of having to meet the problem of construct validity head on rather than sidestepping it by abandoning theoretical constructs.

The basic problem in construct validity is the possibility "that the operations which are meant to represent a particular cause or effect construct can be construed in terms of more than one construct, each of which is stated at the same level of reduction" (Cook & Campbell, 1979, p. 59). The qualifier regarding the level of reduction refers to the fact that alternative explanations of a phenomenon can be made at different levels of analysis and that sort of multiplicity of explanation does not threaten construct validity. This is most clearly true across disciplines. One's support for a political position could be explained at either a sociological level or by invoking a psychological analysis, for example, of attitude formation. Similarly, showing there is a physiological correlate of some behavior does not mean the behavioral phenomenon is to be understood as nothing but the outworking of physiological causes.

Some examples of specific types of artifacts serve to illustrate the confounding that can threaten construct validity. (Confounding means the inadvertent manipulation or assessment of other theoretically relevant variables besides the variable the investigator intended to study.) For example, a famous series of studies begun at the Western Electric plant at Hawthorne, Illinois, in 1927 investigated the effects of various changes in the physical environment on the productivity of workers in the plant (Roethlisberger & Dickson, 1939). When the brightness of the lights above a group of workers was increased, their performance improved. However, it was found that when the lighting for another selected group of workers was darkened somewhat, their performance also improved. In fact, it seemed that no matter what small change was made in the working environment of a group of workers, the result was an increase in their productivity. Although the investigators initially viewed the independent-variable construct merely as changes in level of illumination, that performance seemed to be affected similarly for the groups of workers being studied regardless of which feature of the physical environment was manipulated led even-

Threats to the Validity of Inferences from Experiments 31

tually to the conclusion that other constructs were being manipulated as well. The "Hawthorne effect" eventually came to be identified with the effect of psychological variables such as the perception of concern by management over working conditions or, more generally, the effects of awareness that one is participating in a research study.

Another example of a threat to construct validity is the experimenter-bias effect demonstrated by Rosenthal (1976). This effect involves the impact of the researcher's expectancies and in particular the transmission of that expectancy to the subject in such a way that performance on the dependent variable is affected. Thus, when the experimenter is not blind to the hypothesis under investigation, the role of experimenter bias must be considered as well as the nominal treatment variable in helping to determine the magnitude of the differences between groups.

Two major pitfalls to avoid to minimize threats to construct validity can be cited: inadequate preoperational explication of the construct and mono-operation bias, or using only one set of operations to implement the construct (Cook & Campbell, 1979, p. 64ff.). First, regarding explication, the question is, What are the essential features of the construct for your theoretical purposes? For example, if you wish to study social support, does your conceptual definition include the perceptions and feelings of the recipient of the support or simply the actions of the provider of the support? Explicating a construct involves consideration not only of the construct you want to assess but also the other similar constructs from which you hope to distinguish your construct (see Campbell & Fiske, 1959; Judd & Kenny, 1981). Second, regarding mono-operation bias, using only a single dependent variable to assess a psychological construct typically runs the risk of both underrepresenting the construct and containing irrelevancies. For example, anxiety is typically regarded as a multidimensional construct subsuming behavioral, cognitive, and physiological components. Because measures of these dimensions will be much less than perfectly correlated, if one's concern is with anxiety in general, then using only a single measure is likely to be misleading.

External Validity. The Final type of validity we consider refers to the stability across other contexts of the causal relationship observed in a given study. The issue in external validity is. Can I generalize this finding across populations, or settings, or time? As mentioned in our discussion of the uniformity of nature, this is more of an issue in psychology than in the physical sciences.

A central concern with regard to external validity is typically the heterogeneity and representativeness of the sample of people participating in the study. Unfortunately, most research in the human sciences is carried out using the sample of subjects that happens to be conveniently available at the time. Thus, there is no assurance that the sample is representative of the initial target population, not to mention some other population to which another researcher may want to generalize. In Chapter 2, we consider one perspective on analyzing data from convenience samples, which, unlike most statistical procedures, does not rely on the assumption of random sampling from a population.

For now it is sufficient to note that the concern with external validity is that the effects of a treatment observed in a particular study may not consistently be

32 Chapter I The Logic of Experimental Design

obtained. For example, one of the authors found that a classroom demonstration of a mnemonic technique that had repeatedly shown the mnemonic method superior to a control condition in a sophomore-level class actually resulted in worse performance than the control group in a class of students taking a remedial instruction course. Freshmen had been assigned to take the remedial course in part on the basis of their poor reading comprehension and apparently failed to understand the somewhat complicated written instructions given to the students in the mnemonic condition.

One partial solution to the problem of external validity is, where possible, to take steps to assure that the study will use a heterogeneous group of persons, settings, and times. Note that this is at odds with one of the recommendations we made regarding statistical conclusion validity. In fact, what is good for the precision of a study, such as standardizing conditions and working with a homogeneous sample of subjects, is often detrimental to the generality of the findings. The other side of the coin is that although heterogeneity makes it more difficult to obtain statistically significant findings, once they are obtained it allows generalization of these findings with greater confidence to other situations. In the absence of such heterogeneity or with a lack of observations with the people, settings, or times to which you wish to apply a finding, your generalization must rest on your ideas of what is theoretically important about these differences from the initial study (Campbell, 1986).

What are threats to internal validity of experimental studies?

There are eight threats to internal validity: history, maturation, instrumentation, testing, selection bias, regression to the mean, social interaction and attrition.
Correct Answer : The major time-related factors that threaten within-subjects designs are history, instrumentation, maturation, and regression toward the mean. In each case, an outside factor can cause the participants' scores to change from one treatment to the next.

Is random assignment a threat to internal validity?

Random assignment increases internal validity by reducing the risk of systematic pre-existing differences between the levels of the independent variable. 3. Studies that use random assignment are called experiments.

What are the three most common threats to internal validity?

Threats to Internal Validity.
Attrition: Attrition is bad for your research because it leads to a bias. ... .
Confounding variables: When your research has an extra variable related to the treatment you applied to your sample group that affects your results, then that leads to confusion. ... .
Diffusion: This is a tricky one..