Hawthorne effect

From Free net encyclopedia

Template:Cleanup-date

The factual accuracy of this article is disputed.
Please see the relevant discussion on the talk page.

The Hawthorne effect is a phenomenon in industrial psychology first observed in the 1920s that refers to improvements in productivity or quality resulting from the mere fact that workers knew they were being studied or observed. For fifty years, the study underlying this phenomenon was highly influential in the study of organizational behavior. Many later studies failed to find evidence for it, and in the 1970s substantial flaws were revealed in the original studies [1]. The Hawthorne Effect is still widely invoked, even after being proved incorrect.

Related is the Pygmalion effect (or Rosenthal effect) which refers to situations in which students performed better than other students simply because they were expected to do so. Also related is the placebo effect, the phenomenon that a patient's symptoms can be alleviated by an otherwise ineffective treatment, apparently because the individual expects or believes that it will work.

Contents

Definitions

The "Hawthorne effect" was not named after a researcher, but refers to the factory where the effect was first observed and described: the Hawthorne works of the Western Electric Company in Chicago, 1924-1933.

One definition of the Hawthorne effect is:

  • An experimental effect in the direction expected but not for the reason expected; i.e. a significant positive effect that turns out to have no causal basis in the theoretical motivation for the intervention, but is apparently due to the effect on the participants of knowing themselves to be studied in connection with the outcomes measured.

Parsons (1974) described it as:

  • "Generalizing from the particular situation at Hawthorne, I would define the Hawthorne Effect as the confounding that occurs if experimenters fail to realize how the consequences of subjects' performance affect what subjects do".

Case studies

Studies were done between 1924 and around 1933. Fritz J. Roethlisberger and William J. Dickson give a great amount of detail, but little interpretation. Elton Mayo of Harvard Business School gives a shorter account, including the interpretation which has been so influential: that it was feeling they were being closely attended to that caused the improvement in performance.

Originally Hawthorne effect research was a series of studies on the productivity of workers manipulated various conditions (pay, light levels, rest breaks, etc.) but each change resulted on average over time in productivity rising, including eventually a return to the original conditions. This was true of each of the individual workers as well as of the group mean.

Clearly the variables the experimenters manipulated were not the only nor dominant causes of productivity changes. One interpretation, mainly due to Mayo, was that the important effect here was the feeling of being studied: it is this that is now referred to by "the Hawthorne effect".

Illumination experiments

From 1924 to 1927 there were two and a half years of illumination level experiments. In 1927 four studies began on selected small groups. In 1932 a questionnaire and interview study of over 20,000 employees.

Of the following cases, the first two were experimented in the whole department:

  • No control group, experimental groups in three different departments. All showed an increase of productivity (from an initial base period), didn't decrease with illumination.
  • Experimental and control groups. Experimental group got a sequence of decreasing light levels. Both groups steadily increased production, until finally light in experimental group so low they protested and production fell off.
  • 2 groups. The control group got stable illumination; the other got a sequence of increasing levels. Got a substantial rise in production in both, but no difference between the groups.
  • 2 girls only. Their production stayed constant under widely varying light levels, but they said they preferred the light (1) if experimenter said bright was good, then the brighter they believed it to be the more they liked it; (2) then ditto when he said dimmer was good. And if they were deceived about a change, they said they preferred it, i.e. it was their belief about the light level not the actual light level, and what they thought the experimenter expected to be good, not what was materially good.

Relay assembly experiments

The relay assembly experiments on a group of one plus five female operators. There were four different cases:

  1. Rest pauses and hours of work (in a separate room). Small group piecework the only exception variable.
  2. About a piecework payment system (on a separate bench, but normal room).
  3. Mica splitting test room. Like the first one: separate room, but already and constantly on piecework rates.
  4. Bank wiring: pure observation of a 14 man team. Group piecework. Could always easily see their own rate.

The first of the four cases, a group of 6 experienced female workers segregated (1 serving, 5 assembling telephone relays) was evaluated as a one-minute task in good conditions. Output was carefully measured in a 5-year study. Output (time for every relay produced) was secretly measured for two weeks before moving them to the experimental room. After that it had taken the next steps:

  • five weeks of measures.
manipulations of pay rules (group piecework for the 5 person group).
then 2 to 5 minute breaks (after a discussion with them on the best length of time).
then 2 to 10 minute breaks (not their preference) again produced improvement.
then 6 to 5 minute rests (they disliked) reduced output.
then food in the breaks.
shortened the day by 30 minutes (output up).
shortened it more (output per hour up, but overall down)
return to earlier condition (output peaked)

Attitudes as well as behavior and output were measured.

H. McIlvaine Parsons (1974) argues that in 2a (first case) and 2d (fourth case) they had feedback on their work rates; but in 2b they didn't. He argues that in the studies 2a-d, there is at least some evidence that the following factors were potent:

  1. Rest periods
  2. Learning, given feedback i.e. skill acquisition
  3. Piecework pay where an individual does get more pay for more work, without counter-pressures (e.g. believing that management will just lower pay rates).

Parsons redefines "the Hawthorne effect as the confounding that occurs if experimenters fail to realize how the consequences of subjects' performance affect what subjects do" [i.e. learning effects, both permanent skill improvement and feedback-enabled adjustments to suit current goals]. So he is saying it is not attention or warm regard from experimenters, but either a) actual change in rewards b) change in provision of feedback on performance. His key argument is that in 2a the "girls" had access to the counters of their work rate, which they didn't previously know at all well.

It is notable however that he refuses to analyze the illumination experiments, which don't fit his analysis, on the grounds that they haven't been properly published and so he can't get at details, whereas he had extensive personal communication with Roethlisberger and Dickson.

Its possible that the illumination experiments were explained by a longitudinal learning effect. But Mayo says it is to do with the fact that the workers felt better in the situation, because of the sympathy and interest of the observers. He does say that this experiment is about testing overall effect, not testing factors separately. He also discusses it not really as an experimenter effect but as a management effect: how management can make workers perform differently because they feel differently. A lot to do with feeling free, not feeling supervised but more in control as a group. The experimental manipulations were important in convincing the workers to feel this way: that conditions were really different. The experiment was repeated with similar effects on mica splitting workers.

When we refer to "the Hawthorne effect" we are pretty much referring to Mayo's interpretation in terms of workers' perceptions, but the data show strikingly continuous improvement. It seems quite a different interpretation might be possible: learning, expertise, reflection -- all processes independent of the experimental intervention? However the usual Mayo interpretation is certainly a real possible issue in designing studies in education and other areas, regardless of the truth of the original Hawthorne study.

Recently the issue of "implicit social cognition" i.e. how much weight we actually give to what is implied by others' behavior towards us (as opposed to what they say e.g. flattery) has been discussed: this must be an element here too.

Richard E. Clark and Timothy F. Sugrue (1991, p.333) in a review of educational research say that uncontrolled novelty effects (i.e. halo effect) cause on average 30% of a standard deviation (SD) rise (i.e. 50%-63% score rise), which decays to small level after 8 weeks. In more detail: 50% of a SD for up to 4 weeks; 30% of SD for 5-8 weeks; and 20% of SD for > 8 weeks, (which is < 1% of the variance).

Can we trust the research?

Candice Gleim says:

Broad experimental effects and their classifications can be found in Donald T. Campbell & Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally. and Cook, T. D., & Campbell, D. T. (1979), Quasi-Experimentation: Design and Analysis Issues. Houghton Mifflin Co.

Michael L. Kamil says: You might want to be a bit careful about the scientific basis for the Hawthorne effect. Lee Ross has brought the concept into some question. There is a popular news story in the New York Times a couple of years ago:

David Carter-Tod says: A psychology professor at the University of Michigan, Dr. Richard Nisbett, calls the Hawthorne effect 'a glorified anecdote.' 'Once you've got the anecdote,' he said, 'you can throw away the data.'" A dismissive comment which back-handedly tells you something about the power of anecdote and narrative. There is however, no doubt that there is a Hawthorne effect in education particularly see: reference 1 reference 2 reference 3

Don Smith says: I recall studying the Hawthorne Effect as an undergraduate for a management degree years ago. At that time the message was that if a group knew they were being studied the results may be biased.

Harry Braverman say in "Labor and Monopoly Capital": The Hawthorne tests were based on behaviorist psychology and were supposed to confirm that workers performance could be predicted by pre-hire testing. However, the Hawthorne study showed "that the performance of workers had little relation to ability and in fact often bore a reverse relation to test scores...". What the studies really showed was that the workplace was not "a system of bureaucratic formal organization on the Weberian model, nor a system of informal group relations, as in the interpretation of Mayo and his followers but rather a system of power, of class antagonisms". This discovery was a blow to those hoping to apply the behavioral sciences to manipulate workers in the interest of management.

What may be wrong about the quoted dismissiveness is that there was not one study, but three illumination experiments, and 4 other experiments: only one of these seven is alluded to. What is right is that a) there certainly are significant criticisms of the method that can be made and b) most subsequent writing shows a predisposition to believe in the Hawthorne effect, and a failure to read the actual original studies.

Can we trust the literature?

The experiments were quite well enough done to establish that there were large effects due to causal factors other than the simple physical ones the experiments had originally been designed to study. The output ("dependent") variables were human work, and we can expect that educational effects to be similar (but it is not so obvious that medical effects would be). The experiments stand as a warning about simple experiments on human participants as if they were only material systems. There is less certainty about the nature of the surprise factor, other than it certainly depended on the mental states of the participants: their knowledge, beliefs, etc.

Candidate causes are:

  1. Material factors, as originally studied e.g. illumination, ...
  2. Motivation or goals e.g. piecework, ...
  3. Feedback: can't learn skill without good feedback. Simply providing proper feedback can be a big factor. This can often be a side effect of an experiment, and good ethical practice promotes this further. Yet perhaps providing the feedback with nothing else may be a powerful factor.
  4. The attention of experimenters.

Parsons implies that (4) might be a "factor" as a major heading in our thinking, but as a cause can be reduced to a mixture of (2) and (3). That is: people might take on pleasing the experimenter as a goal, at least if it doesn't conflict with any other motive; but also, improving their performance by improving their skill will be dependent on getting feedback on their performance, and an experiment may give them this for the first time. So you often won't see any Hawthorne effect -- only when it turns out that with the attention came either usable feedback or a change in motivation.

Adair (1984): warns of gross factual inaccuracy in most secondary publications on Hawthorne effect. And that many studies failed to find it, but some did. He argues that we should look at it as a variant of Orne's (1973) experimental demand characteristics. So for Adair, the issue is that an experimental effect depends on the participants' interpretation of the situation; that this may not be at all like the experimenter's interpretation and the right method is to do post-experimental interviews in depth and with care to discover participants' interpretations. So he thinks it is not awareness per se; nor special attention per se; but you have to investigate participants' interpretation in order to discover if/how the experimental conditions interact with the participants' goals (in participants' view). This can affect whether participants' believe something, if they act on it or don't see it as in their interest, etc.

Rosenthal and Jacobson (1992) ch.11 also reviews and discusses the Hawthorne effect.

Its interpretation in management research The research was and is relevant firstly in the 'Human Resources Management' movement. The discovery of the effect was most immediately a blow to those hoping to apply the behavioral sciences to manipulate workers in the interest of management.

Other interpretations it has been linked to are: Durkheim's 'anomie' concept; the Weberian model of a system of bureaucratic formal organization; a system of informal group relations, as in the interpretation of Mayo and his followers; a system of power, of class antagonisms.

Summary view of Hawthorne

In the light of the various critiques, I think we could see the Hawthorne effect at several levels.

At the top level, it seems clear that in some cases there is a large effect that experimenters did not anticipate, that is due to participants' reactions to the experiment itself. This is the analogue to the Heisenberg uncertainty principle BUT (unlike in quantum mechanics) it only happens sometimes. So as a methodological heuristic (that you should always think about this issue) it is useful, but as an exact predictor of effects, it is not: often there is no Hawthorne effect of any kind. To understand when and why we will see a Hawthorne or experimenter effect, we need more detailed considerations.

At a middle level, I would go with Adair (1984), and say that the most important (though not the only) aspect of this is how the participants interpret the situation. Interviewing them (after the "experiment" part) would be the way to investigate this.

This is important because factory workers, students, and most experimental participants are doing things at the request of the experimenter. What they do depends on what their personal goals are, how they understand the task requested, whether they want to please the experimenter and/or whether they see this task as impinging on other interests and goals they hold, what they think the experimenter really wants. Besides all those issues that determine their goals and intentions in the experiment, further aspects of how they understand the situation can be important by affecting what they believe about the effects of their actions. Thus the experimenter effect is really not one of interference, but of a possible difference in the meaning of the situation for participants and experimenter. Since all voluntary action (i.e. actions in most experiments) depends upon the actor's goals AND on their beliefs about the effects of their actions, differences in understanding of the situation can have big effects.

At the lowest level is the question of what the direct causal factors might be. These could include:

  • Material ones that are intended by the experimenter
  • Feedback that an experiment might make available to the participants
  • Changes to goals, motivation, and beliefs about action effects induced by the experimental situation.

Jastrow's effect of expectancy on punched card workers

According to Rosenthal & Jacobson (1968), Jastrow (1900) reported another striking effect on workers being trained on the then new IBM Hollerith punch card machines in the US census bureau. The first group were expected by the inventor to produce 550 per day, and did so but had great difficulty in improving on that. However a second group who were isolated from the expectation were soon doing 2100 per day.

Teacher effects

Although not of central importance here, of huge importance in educational research in general is the issue of teacher effects. Tim O'Shea once told me that in all studies where one of the variables was the teacher, the effect of different teachers was always bigger than the effect of different treatments (usually what was meant to be being studied). Basically, teachers have a huge effect but one we don't understand at all.

If we did, we could train teachers to use best practice in the sense of getting the best effects: but we have no idea how to do that. Assuming this is true, this is the most important effect in the whole field of education. (Consider: if this was true in medicine, then it wouldn't matter much what treatment you gave a patient, the most important thing would be to get the best doctor regardless of drugs, surgery or other treatments.) It also implies that the professionalisation of teaching does not entail improvement in learning or in any rational basis for treating learners, though it may from a social viewpoint or of course from the viewpoint of the benefits to practitioners of restrictive practices and regulation to exclude the worst practitioners. However we shouldn't be surprised. Medicine was organized into its current professional form before there was a single scientifically justified treatment available: in the UK, the governing professional body, the General Medical Council, was established by law in essentially its present form by the 1858 Medical Act. However on an optimistic view, Pasteur's rabies vaccination, established around 1870, was the first medical treatment based on scientific evidence; and it has been estimated that 1911 is the first year when a patient was objectively likely to benefit from being treated by a doctor. (L.J Henderson: "somewhere between 1910 and 1912 in this country, a random patient with a random disease, consulting a doctor at random had, for the first time in the history of mankind, a better than a fifty-fifty chance of profiting from the encounter." as quoted in John Bunker (2001) "Medicine Matters After All: Measuring the benefits of medical care, a healthy lifestyle, and a just social environment" (Nuffield Trust))

Note too that all this casts doubt on the value of training teachers, apart from giving them practice to learn for themselves: if we don't know what it is about teachers' behavior that has such large effects on learning, how can we usefully train them? In the absence of this knowledge, the only measure of a teacher's worth is the comparative learning outcomes of their students. However neither teachers nor teacher training is usually assessed by this. So while it is quite possible that teachers learn either by unaided practice, or by unconscious imitation of other teachers (apprenticeship learning), there is almost no evidence on whether that training makes a difference.

The empirical observation of the importance of teachers has major implications for theory. Because they are of such large importance, I prefer Laurillard's theory of the learning and teaching process to others since it gives equal weight to learners and to teachers, and I regard slogans such as "learner-centered" and theories such as neo-constructivism to be flawed because they do not acknowledge or give a place to teachers of the prominence that they in fact have in the causation of learning.

So given the importance of teacher effects, what is the evidence? I need to do a proper review of this. But the Pygmalion effect is one big demonstration of the effect of teachers, showing they can double the amount of pupil progress in a year. Rosenthal & Jacobson (1992) also mention briefly research that showed that 10 secs of video without sound of a teacher allows students to predict the ratings they will get as a teacher. Similarly hearing the sound without vision AND without content (rhythm and tone of voice only) were enough too. This is powerful evidence that teachers differ in ways they cannot easily or normally control, but which are very quickly perceptible, and which at least in students' minds, determine their value as a teacher. (And Marsh's (1987) work shows that student ratings of teachers do relate to learning outcomes.)

This also brings out an essential difference between medicine and education. In education, the teacher is supposed (except by radicals) to be a major cause of learning; while in medicine it is supposed to be the "treatment" regardless of who administers it. The placebo effect: does it really exist?

Placebos are things like sugar pills, that look like real treatments but in fact have no physical effect. They are used to create "blind" trials in which the participants do not know whether they are getting the active treatment or not, so that physical effects can be measured independently of the participants' expectations. There are various effects of expectations, and blind trials control all of these together by making whatever expectations there are equal for all cases. Placebos aren't the only possible technique for creating blindness (unawareness of the intervention): to test the effectiveness of prayer by others, you just don't tell the participants who has and has not had prayers said for them. To test the effect of changing the frequency of fluorescent lights on headaches, you just change the light fittings at night in the absence of the office workers (this is a real case).

Related to this is the widespread opinion that placebo effects exist, where belief in the presence of a promising treatment (even though it is in fact an inert placebo) creates a real result e.g. recovery from disease. Placebos as a technique for blinding will remain important even if there is no placebo effect, but obviously it is in itself interesting to discover whether placebo effects exist, how common they are, and how large they are. After all, if they cure people then we probably want to employ them for that.

Claims that placebo effects are large and widespread go back to at least Beecher (1955). However Kienle and Kiene (1997) did a reanalysis of his reported work, and concluded his claims had no basis in his evidence; and then Hrobjartsson & Gotzsche (2001) did a meta-analysis or review of the evidence, and concluded that most of these claims have no basis in the clinical trials published to date. The chief points of their skeptical argument are:

  • Only trials that compare a group that gets no treatment with another group that gets a placebo can test the effect.
  • Most claims are based on looking at the size of the improvement measured in placebo groups in trials comparing only placebo and experimental (active) treatments. This is misleading since (for instance) most diseases have a substantial clearup rate with no treatment: seeing improvements doesn't mean the placebo had an effect. (Put more technically, comparing with the baseline (pretest measure) is vulnerable to regression to the mean.)

Nevertheless, even they conclude that there is a real placebo effect for pain (not surprising since this is partly understood theoretically: Wall, 1999)); and for some other continuously-valued subjectively-assessed effects. A recent experimental demonstration was reported: Zubieta et al. (2005) "Endogenous Opiates and the Placebo Effect" The journal of neuroscience vol.25 no.34 p.7754-7762

This seems to show that the psychological cause (belief that the placebo treatment might be effective in reducing pain) causes opioid release in the brain, which then presumably operates in an analogous way to externally administered morphine.

A recent and more extensive review of the overall dispute is: M. Nimmo (2005) Placebo: Real, Imagined or Expected? A Critical Experimental Exploration Final year undergraduate Critical Review, Dept. of Psychology, University of Glasgow. PDF copy.

Summary

  • Placebos are used to create blind trials. They are not the only technique for this, but are a very common and important one.
  • Whether or not there is a placebo effect, placebos will remain an important technique for this.
  • Recent skeptical meta-analysis of placebo effects suggest that the effect does exist, but only in very limited contexts. The widespread claims are mainly misplaced, based on faulty inferences.
  • Placebos are often seen as posing ethical difficulties. Essentially the issues are of two kinds, neither about placebos alone.
    • Deceiving experimental participants, or at least withholding information. This is potentially in tension with the principle of informed consent. This is most acute for experiments that wish not just to achieve blinding, but to measure the effect of expectancies, and so wish to induce expectancies by misinforming (some) participants.
    • Withholding treatment from patients (or education from students). The tension here is between the greater certainty a controlled experiment will give, versus the prior guesses of people and experts. After all, you probably wouldn't do an experiment unless you had some reason to hope a treatment worked; but if you do have such grounds, then your opinion of the best treatment should be given to all patients rather than give some a placebo.

Ways of classifying and comparing such effects Can we organize these (and other) various reported effects in some useful way? What are the effects that might be related?

  • The placebo effect in medicine, where getting an inert (e.g. sugar) pill has a large positive effect. Many believe that there are often large positive effects apparently simply from the expectation created in the patient: if true, this is the placebo effect, where the intervention in fact has no material effect, but the belief by the participant does. Although often transmitted from the doctor's expectancies, it may be independent of the doctor. It may show particularly strongly in side-effects, where the number and severity of side-effects may be three times larger when patients are warned about the possibility in both groups that get the active treatment and in the placebo group. However as noted above, some do not believe any such effect exists.
  • The Hawthorne effect: simply of being studied. Aspects of this suggest that the effect did not depend on the particular expectation of the researchers, but that being studied caused the improved performance. This might be because attention made the workers feel better; or because it caused them to reflect on their work and reflection caused performance improvements, or because the experimental situation provided them with performance feedback they didn't otherwise have and this extra information allowed improvements.
  • The John Henry effect (Zdep & Irvine; 1970) is the opposite of the Hawthorne effect. The John Henry effect occurs when the intended control group, that gets no intervention, compares themselves to the experimental group and through extra effort gets the same results. [2]
  • The halo effect of uncontrolled novelty: the participant performs differently at first because of the novelty of the "treatment" which may change their expectation, or simply cause them to be more alert or otherwise perform differently. The experimenter is not important, but a materially unjustified belief, perhaps from other social media, may be (e.g. participants think the technology / educational intervention is wonderful and that belief is the real cause of raised outcomes); or else simply the novelty rather than belief matters, if it operates through (say) attention rather than through expectancies.
  • Experimenter effects. Specific expectations acquired, consciously or not, from the researcher. Some experimenter effects have been demonstrated equally in positive and negative directions. Rosenthal (1966) describes experimentally tested experimenter effects in behavioral research, which is summarized by Rosenthal & Jacobson (1992). Prophesying a difference caused research assistants to create an effect, and this could be done equally in either direction (i.e. can create a positive or negative effect this way). This was done where the experimental task being manipulated required judgments by the nominal participants. However this was about one tenth the size of the effect prophesied, so it would be quite wrong to describe this as "seeing what you expect": it would be more accurate to suggest that experimenters could influence subjects on marginal cases and so systematically bias (only) within the range of experimental "noise". However if stooges acting as the first subjects behaved differently, this overrode and created a more effective expectation (and consequent effect on real subjects). Such effects have also been demonstrated in animal experiments and on learning and IQ tests/tasks at least sometimes.
  • Jastrow's effect on factory work was much bigger: here an explicit expectation about performance was transmitted and turned out to change output by a factor of three.
  • The Pygmalion effect or "expectancy advantage" is that of a self-fulfilling prophecy. Teachers' expectations of pupils can strongly affect (by about a factor of two over a year) the amount of development they show.

Placebo vs. Hawthorne effects The placebo and Hawthorne effects compare and contrast in these ways:

  • Both are psychological effects of the participants, causing an effect when the material intervention has no effect.
  • Both are effects produced by the learners' perceptions and reactions; but the former emphasizes their response to new equipment or methods, while the latter emphasizes their response simply to being studied.
  • The leading suspected cause in the placebo effect is the participants' false belief in the material efficacy of the intervention. The leading suspected cause in the Hawthorne effect is the participants' response to being studied i.e. to the human attention.
  • In both cases, the experimenter may be deceiving the participants, or may be mistakenly sincere, or neutral with respect to the effects of the technology or intervention. In general however, the experimenter appearing to the participant to believe in the efficacy of the intervention, while not essential, may be more or more often important to the placebo effect than to the Hawthorne effect.

The fields where such effects apply

  • Management, particularly the management of factory labor
  • Medicine
  • Education
  • Experimental psychology

Blind trials In the medical field, a strong adherence to the method of double and triple blind trials, at least of drugs, has developed. We could also use this as a practical, applied, behavioristic way of classifying these effects.

  • Single blind: concealing from the patient (subject, participant) what "treatment" they are getting, and hence what result to expect. Some placebo effects are "pure" in that they demonstrate that the patient's belief independently of the doctor's can be enough to induce effects.
  • Double blind: also concealing from the researcher (the person administering the treatments and measurements). Many placebo effects and the Pygmalion effect demonstrate how researchers' expectations can have a large effect. It is possible too that expectancies may cause a teacher or doctor actually to treat the pupil or patient differently in a way that benefits or disadvantages them; i.e. to have a direct effect independent of the primary participants' expectancies.
  • Triple blind: concealing treatment and hence expectations from a person making measurement judgments e.g. a lab technician classifying cells as cancerous, a doctor assessing a patient's degree of mobility or pain as an output (dependent) variable.

Thus from a practical point of view, there are three classes of humans to be managed in an experimental trial, and whose expectations have each been shown sometimes to affect its outcome. Why perhaps rational No-one knows the mechanisms behind these effects. However it is not hard to generate speculations on how they might be advantageous and so quasi-rational.

Note that not all conceivable effects are in fact observed. For instance, placebo and other expectancy effects have been shown to operate on pain and on effects like nausea; but not to heal broken bones. This makes sense because (contrary to common sense) it has been shown that cognitive expectations have a big effect on the operation of pain (Wall, 1999) (but not on bone growth); and also on perceptions of fatigue e.g. when running (Lovett, 2004). In education however learning depends almost entirely on the learner's actions, so if the learner believes they cannot learn they are just as unlikely to learn as a walker is to be found at the top of a mountain which they believed they could not climb.

  • John Henry: act to maintain respect and status.
  • Hawthorne: people paying you attention makes you feel like working harder. Aspects of the Hawthorne studies suggest that it was not that the researchers expected a better result in every case, but that being studied caused improved performance. This is similar to job interviews, and competitive events, where participants often perform much better than their average performance because they are being studied and assessed, not because they are expected to do better.
  • Hawthorne: being studied prompts a person to reflect more on the task i.e. to study their performance and themselves as well; and reflection often causes them to improve their performance gradually because they are thinking about it, and about improving it.
  • Hawthorne: Parson's argument, primarily about feedback provision, is that learning (improving a skill) requires plenty of feedback on your performance. If an experiment provides that (as a side effect of making experimental measurements) where it wasn't readily available before, you may see performance improvement due to that alone. He argues convincingly that the original Hawthorne studies did provide performance feedback to workers they didn't previously get.
  • Jastrow, Pygmalion, Experimenter effects: where participants are strongly affected by the expectations expressed by their "bosses". In work, a worker donates their time to an employer in return for pay but remains responsible for the amount of effort they make. We know from injuries in sport and in the home (e.g. lifting things) that it is dangerous to have unrealistic estimates of one's own capabilities, so this is rational and in both worker's and manager's interests to get these estimates right. Thus expectations are rightly central in regulating, and so limiting, output; and naturally we are very likely to attend to others' informed opinion on these.
  • Halo, placebo effects. Conversely if a participant normally has a zero expectancy about something, then something new may get them to review it. In education if a student does not believe they can improve at something then they won't try (e.g. "I can't do math", "Perfect pitch is an innate ability so there is no point in me practicing"), but an experiment might make them change this assumption and so start making an effort to learn (placebo effect, halo effect).

Research methods implications: Shayer: pure and applied research These are some notes stimulated by a valuable chapter by Shayer (1992).

There are two different aims for research:

  • [Science]: Finding the causes, testing a (causal) model (one cause, all effects)
  • [Engineering] Discovering and proving the generalisability of the effect. (one desired effect, all necessary and sufficient causes)

Science studies

If you want just to find causes and laws, not to achieve any useful practical effect, then the focus is on isolating causes by controlling experiments and avoiding things such as the Hawthorne effect. Hence, in medical research, double blind trials etc.

Note that double blind trials (where neither experimenter nor patient know which intervention/treatment they are getting during the trial) are quite practicable for testing pills (where a dummy sugar pill can easily be made that the patient cannot tell apart from other pills); but not for major surgery, nor usually for educational interventions that require actions by the learner: in these cases participants necessarily know which treatment they have been given.

Double (or triple) blind trials "control for" all four of the above effects in the sense of making them equal for all groups by removing the ability of both experimenter and participants to even know which treatment they are getting, much less to believe they know which is more effective.

They may tend to reduce the placebo effect since the patient knows they have only a 50% chance that they are getting the active treatment. However they do NOT remove the Hawthorne effect (only make it equal for all groups in the trial), since on the contrary the experiment almost certainly makes participants very aware of receiving special attention. This could mean that the effect sizes measured in some groups are misleading, and would not be seen later in normal practice. The trial would be a fair comparison between groups, but the (size of) effect measured would not be predictive of the effect seen in non-experimental conditions, due to a similar "error" (i.e. effect due to the Hawthorne effect) applying to both groups.

This could, at least in theory, matter. A case in point could be comparing homeopathic and conventional medicine. Generally a patient will get about 50 minutes of the practitioner's attention in the former case, and 5 minutes in the latter. It is not hard to imagine that this could have a significant effect on patient recovery. A standard double blind experiment would be most seriously misleading in a case where both a drug and the Hawthorne effect of attention were of similar size, but not additive (i.e. either one was effective, but getting both gave no extra benefit): then a conventional trial would see similar and useful effect sizes in all groups, but would not be able to tell that in fact either giving the drug or giving an hour's attention to the patient were alternative effective therapies.

Finally, neither medicine nor education habitually employ counter-balanced experimental designs, where all participants get both treatments: one group gets A then B, and the other gets B then A. This is because of the possibility of asymmetric transfer effects i.e. the effect of B (say) is different depending on whether or not the participant had A first. For instance, learning French vocabulary first then reading French literature is not likely to have the same effect as receiving them the other way round. Applied or engineering studies (Shayer) Shayer thinks there are distinct questions and stages to address in applied as opposed to "scientific" research -- i.e. in research on being able to generalize the creation of a desired effect:

  1. Study primary effect: Is there an effect (whatever the cause), what effect, what size of effect?
  2. Replication: can it be done by other enthusiasts (not only by the original researcher)?
  3. Generalizability: can it be done by non-enthusiasts? i.e. can it be transferred via training to the general population of teachers? i.e. without special enthusiasm or skills. This is actually a test of the training procedure, not of the effect -- but that is a vital part of whether the effect can be of practical use.

One danger is the Hawthorne effect: you get an effect, but not due to the theory. The opposite is to get a null effect even though the theory is correct because transfer/training didn't work. So you need to do projects in several stages, showing effects at each.

In stage (1) you do an experiment and show there really is an effect, defensible against all worries. But you still haven't shown what it is caused by: whether the factors described in your theory, or by the experimenter: i.e. no defense against Hawthorne. Use 1 or 2 teachers, and control like crazy. In (2) you show it can be done by others: so at least not just a Papert charisma effect, but it still might be a learner enthusiasm effect (halo). Use say 12 teachers. In (3) you are testing whether training can be done.

Note that if what you care about is improving learning and the learners' experience, then you may want to maximize not avoid halo and Hawthorne effects. If you can improve learning by changing things every year, telling students this is the latest thing, then that is the ethical and practical and practically effective thing to do.

Rosenthal's suggestions on method Rosenthal and Jacobson (1992) have a brief chapter proposing methods to address these effects, at least for "science" studies of primary effects.

They say firstly we should have Hawthorne controls i.e. 3 groups: control (no treatment); experimental (the one we are interested in); a Hawthorne control, which has a change or treatment manifest to participants but not one that could be effective in the same way as the experimental intervention. [This is the reply to wanting to do triple blind trials, but not being able to avoid participants knowing something is being done; AND is a response to measuring the size of the placebo effect as well as of the experimental effect.]

Secondly, have "Expectancy control designs": 2X2 of control/experimental X with / without secondary participants expecting a result. [Hawthorne effect and control groups are about subject expectancies; expectancy controls are about Pygmalion effect i.e. teachers' expectancies.]

So, combining these, they then suggest a 2 X 3 design of {teacher expects effect or not} X {control, experimental, Hawthorne ctrl i.e. placebo treatment}. The point of these is not merely to avoid confounding factors but to measure their existence and size in the case being studied.

N.B. A medical trial with drug and placebo groups is most like having experimental and Hawthorne-control groups but no pure control group. Adding the latter would additionally require a matched group that was monitored but given no treatment. However participants are normally told it is a blind trial, rather than fully expecting both treatment and placebo to be effective, so this is not an exact parallel.

Adair (1984) suggests that the important (though not the only) aspect of these effects is how the participants interpret the situation. Interviewing them (after the "experiment" part) would be the way to investigate this. This is also essential in "blind" trials to check whether the blinding is in fact effective. Some trials which are conducted and probably published as blind are in fact not. If the active treatment has a readily perceptible side effect on most patients (e.g. hair falls out, urine changes color, pronounced dry mouth) both doctors and patients will quickly know who does and does not have the active drug. Blinding depends on human perception, and so these perceptions should be measured.

Summary recommended method First party (cf. "single blind"): the pupil or patient Second party (cf. "double blind"): the teacher or doctor or researcher 2nd party expectancy 1st party expectancy Teacher (mis)led to expect positive result Experimental group Control group: no treatment Hawthorne control: irrelevant treatment / placebo Teacher (mis)led to expect no effect Experimental group Control group: no treatment Hawthorne control: irrelevant treatment / placebo Plus interview both first and second parties on how they see (interpret) the situation.

We know that all the above effects can have important and unexpected effects. So we cannot trust results that don't at least try to control for them. A double or triple blind procedure allows a 2-group experiment to control for them. Rosenthal's recommended 6-group approach is three times more costly. However it doesn't merely control but measures the size of all three effects (placebo, Hawthorne, and the material effect) separately AND their interactions. If the effects aren't there, that might be grounds for doing it more simply and cheaply in future. But if they are, then without the larger design, we cannot know what size of effect to expect in real life, only that there is an effect that is independent of expectations. Thus we could see a blind trial as somewhat like Shayer's stage 1 (establishing the existence of an effect), while the larger designs also address aspects of later practical stages.

Because placebo effects are so large and so prevalent in medicine, blind trials have become the standard there. Nevertheless they do not give information about the size of benefit to be expected in real life use. In fact it may initially be greater than in the trials, because the placebo effect will be unfettered (everyone will expect it to work after the trials), but may decline to lower levels later. Another way of looking at it is that blind trials test the effect of the (say) drug, but resolutely refuse to investigate the placebo and Hawthorne benefits even though these may possibly be of similar size and benefit to the patient. Drug companies may reasonably stick to research that informs their concerns only, but those who either claim to investigate all causes or those that benefit patients or pupils have much less excuse.

Currently we don't understand how any of these effects work. This could probably be done, but would require some concentrated research e.g. on uncovering how expectancies are communicated (cf. "clever Hans") unconsciously or anyway implicitly, and what expectancies are in fact generated.

See also

References

  • Mayo, E. (1933) The human problems of an industrial civilization (New York: MacMillan) ch. 3.
  • Roethlisberger, F. J. & Dickson, W. J. (1939) Management and the Worker (Cambridge, Mass.: Harvard University Press).
  • Business Process Improvement Pattern
  • Landsberger, Henry A. (1958) Hawthorne Revisited, (Ithaca, NY: Cornell University)
  • Gillespie, Richard (1991) Manufacturing knowledge : a history of the Hawthorne experiments (Cambridge : Cambridge University Press)
  • Was There a Hawthorne Effect? Stephen R. G. Jones, The American Journal of Sociology. 98(3) (Nov., 1992), pp. 451-468, from the abstract "the main conclusion is that these data show slender to no evidence of the Hawthorne Effect"
  • Persistance of a Flawed Theory from Psychology Today
  • Franke, R.H. & Kaul, J.D. "The Hawthorne experiments: First statistical interpretation." American Sociological Review, 1978, 43, 623-643.
  • Steve Draper, university professor of the UK.

Further reading

  • G. Adair (1984) "The Hawthorne effect: A reconsideration of the methodological artifact" Journal of Appl. Psychology 69 (2), 334-345 [Reviews references to Hawthorne in the psychology methodology literature.]
  • Clark, R. E. & Sugrue, B. M. (1991) "Research on instructional media, 1978-1988" in G. J. Anglin (ed.) Instructional technology: past, present, and future, ch.30, pp.327-343. Libraries unlimited: Englewood, Colorado.
  • Gillespie, Richard, (1991) Manufacturing knowledge : a history of the Hawthorne experiments. Cambridge : Cambridge University Press.
  • Jastrow (1900) Fact and fable in psychology. Boston: Houghton Mifflin.
  • Lovett, R. "Running on empty" New Scientist 20 March 2004 181 no.2439 pp.42-45
  • Marsh, H.W. (1987) "Student's evaluations of university teaching: research findings, methodological issues, and directions for future research" Int. Journal of Educational Research 11 (3) pp.253-388.
  • Elton Mayo (1933) The human problems of an industrial civilization (New York: MacMillan)
  • Orne, M. T. (1973) "Communication by the total experimental situation: Why is it important, how it is evaluated, and its significance for the ecological validity of findings" in P. Pliner, L. Krames & T. Alloway (Eds.) Communication and affect pp.157-191. New York: Academic Press.
  • H. M. Parsons (1974) "What happened at Hawthorne?" Science 183, 922-932 [A very detailed description, in a more accessible source, of some of the experiments; used to argue that the effect was due to feedback-promoted learning.]
  • Fritz J. Roethlisberger & Dickson, W. J. (1939) Management and the Worker. Cambridge, Mass.: Harvard University Press.
  • Rosenthal, R. (1966) Experimenter effects in behavioral research (New York: Appleton).
  • Rosenthal, R. & Jacobson, L. (1968, 1992) Pygmalion in the classroom: Teacher expectation and pupils' intellectual development. Irvington publishers: New York.
  • Rhem, J. (1999) "Pygmalion in the classroom" in The national teaching and learning forum 8 (2) pp. 1-4.
  • Schön, D. A. (1983) The reflective practitioner: How professionals think in action (Temple Smith: London) (Basic books?)
  • Shayer, M. (1992) "Problems and issues in intervention studies" in Demetriou, A., Shayer, M. & Efklides, A. (eds.) Neo-Piagetian theories of cognitive development: implications and applications for education ch. 6, pp.107-121. London: Routledge.
  • Wall, P. D. (1999) Pain: the science of suffering. Weidenfeld & Nicolson.
  • Zdep, S. M. & Irvine, S. H. (1970) "A reverse Hawthorne effect in educational evaluation." Journal of School Psychology 8, pp.89-95.de:Hawthorne-Effekt

he:אפקט הות'ורן nl:Hawthorne-experimenten pl:Efekt Hawthorne zh:霍桑效应