Operant conditioning

From Free net encyclopedia

Operant conditioning, so named by psychologist B. F. Skinner, is the modification of behavior brought about over time by the consequences of said behavior. Operant conditioning is distinguished from Pavlovian conditioning in that operant conditioning deals with voluntary behavior explained by its consequences, while Pavlovian conditioning deals with involuntary behavior triggered by its antecedents.

Operant conditioning, sometimes called instrumental conditioning or instrumental learning, was first extensively studied by Edward L. Thorndike (1874-1949), who observed the behavior of cats trying to escape from home-made puzzle boxes. When first constrained in the boxes, the cats took a long time to escape. With experience, ineffective responses occurred less frequently and successful responses occurred more frequently, enabling the cats to escape in less time over successive trials. In his Law of Effect, Thorndike theorized that successful responses, those producing satisfying consequences, were "stamped in" by the experience and thus occurred more frequently. Unsuccessful responses, those producing annoying consequences, were stamped out and subsequently occurred less frequently. In short, some consequences strengthened behavior and some consequences weakened behavior. B.F. Skinner (1904-1990) built upon Thorndike's ideas to construct a more detailed theory of operant conditioning based on reinforcement and punishment.

Contents

Reinforcement and punishment

Reinforcement and punishment, the core ideas of operant conditioning, are either positive (adding a stimulus to an organism's environment), or negative (removing a stimulus from an organism's environment). This creates a total of four basic consequences, with the addition of no consequence (i.e. nothing happens). It's important to note that organisms are not reinforced or punished; behavior is reinforced or punished.

  • Reinforcement is a consequence that causes a behavior to occur with greater frequency.
  • Punishment is a consequence that causes a behavior to occur with less frequency. According to Skinner's theory of operant conditioning, there are two methods of decreasing a behavior or response. These can be by punishment or extinction.

Four contexts of operant conditioning: Here the terms "positive" and "negative" are not used in their popular sense, but rather: "positive" refers to addition, and "negative" refers to subtraction. What is added or subtracted may be either reinforcement or punishment. Hence positive punishment is sometimes a confusing term, as it denotes the addition of punishment (such as spanking or an electric shock), a context that may seem very negative in the lay sense. The four situations are:

  1. Positive reinforcement occurs when a behavior (response) is followed by an appetitive (commonly seen as pleasant) stimulus that increases that behavior. In the Skinner box experiment, a stimulus such as food or sugar solution is present when the rat presses the lever.
  2. Negative reinforcement occurs when a behavior (response) is followed by the removal of an aversive (commonly seen as unpleasant) stimulus thereby increasing that behavior. In the Skinner box experiment, negative reinforcement is a loud noise continuously sounding inside the rat's cage until it presses the lever, when the noise ceases.
  3. Positive punishment occurs when a behavior (response) is followed by an aversive stimulus, such as introducing a shock or loud noise, resulting in a decrease in that behavior.
  4. Negative punishment occurs when a behavior (response) is followed by the removal of an appetitive stimulus, such as taking away a child's toy, resulting in a decrease in that behavior.

Also:

  • A type of learning in which a certain behavior (usually negative) is not done in an attempt to not receive a punishment is termed avoidance learning.
  • Extinction is a related term that occurs when a behavior (response) that had previously been reinforced is no longer effective. In the Skinner box experiment, this is the rat pushing the lever and being rewarded with a food pellet several times, and then pushing the lever again and never receiving a food pellet again. Eventually the rat would cease pushing the lever.

Drawbacks and limitations to operant conditioning

Skinner's construct of learning did not include what Nobel Prize winning biologist Konrad Lorenz termed "fixed action patterns," or reflexive, impulsive, or instinctive behaviors. These behaviors were said by Skinner and others to exist outside the parameters of operant conditioning.

In dog training, the use of the prey drive, particularly in training working dogs, detection dogs, etc., the stimulation of these fixed action patterns, relative to the dog's predatory instincts, are the key to producing very difficult yet consistent behaviors, and in most cases, do not involve operant, classical, or any other kind of conditioning.

The key to understanding this is that, according to the laws of operant conditioning, any behavior that is consistently rewarded, every single time, will be produced only intermittently and will not be reliable. However, in detection dogs, any correct behavior of indicating a "find," must always be rewarded with a tug toy or a ball throw. This is because the prey drive, once started, follows an inevitable sequence: the search, the eye-stalk, the chase, the grab-bite, the kill-bite. This is why dogs trained for detection work, through the prey drive, only work well if they are always reinforced, every single time they behave correctly, which breaks one of the laws of operant conditioning.

Some trainers are now using the prey drive to train pet dogs and find that they get far better results in the dogs' responses to training than when they only use the principles of operant conditioning, which according to Skinner, and his disciple Keller Breland (who invented clicker training), break down when strong instincts are at play.

Avoidance learning

Avoidance training belongs to negative reinforcement schedules. Showing the instrumental response results in terminating or preventing an aversive stimulus. There are two kind of commonly used experimental settings: discriminated and free-operant avoidance learning.

Discriminated avoidance learning

In discriminated avoidance learning, a novel stimulus such as a light or a tone is followed by an aversive stimulus such as a shock (CS-US, similar to classical conditioning). Whenever the animal performs the instrumental response, the CS(conditioned stimulus) respectively the US(unconditioned stimulus)is removed. During the first trials (called escape-trials) the animals usually experiences both the CS and the US, showing the instrumental response to terminate the aversive US. By the time, the animal will learn to perform the response already during the presentation of the CS thus preventing the aversive US from occurring. Such trials are called avoidance trials.

Free-operant avoidance learning

In this experimental session, no discrete stimulus is used to signal the occurrence of the aversive stimulus. Rather, the aversive stimulus (mostly shocks) are presented without explicit warning stimuli.
There are two crucial time intervals determining the rate of avoidance learning. This first one is called the S-S-interval (shock-shock-interval). This is the amount of time which passes during successive presentations of the shock (unless the instrumental response is performed). The other one is called the R-S-interval (response-shock-interval) which specifies the length of the time interval following an instrumental response during which no shocks will be delivered. Note that each time the organism performs the instrumental response, the R-S-interval without shocks begins newly.

Two-process theory of avoidance

This theory was originally established to explain learning in discriminated avoidance learning. It assumes two processes to take place . a) Classical conditioning of fear During the first trials of the training, the organism experiences both CS and aversive US(escape-trials). The theory assumed that during those trials classical conditioning takes places by pairing the CS with the US. Because of the aversive nature of the US the CS is supposed to elicit a conditioned emotional reaction (CER) - fear. In classical conditioning, presenting a CS conditioned with an aversive US disrupts the organisms ongoing behavior. b) Reinforcement of the instrumental response by fear-reduction Because during the first process, the CS signaling the aversive US has itself become aversive by eliciting fear in the organism, reducing this unpleasant emotional reaction serves to motivate the instrumental response. The organism learns to make the response during the CS thus terminating the aversive internal reaction elicited by the CS. An important aspect of this theory is that the term "Avoidance" does not really describe what the organism is doing. It does not "avoid" the aversive US in the sense of anticipating it. Rather the organism escapes an aversive internal state, caused by the CS.

  • One of the practical aspects of operant conditioning with relation to animal training is the use of shaping (reinforcing successive approximations and not reinforcing behavior past approximating), as well as chaining.

See also

References

  • Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Acton, MA: Copley.
  • Skinner, B. F. (1953). Science and human behavior. New York. Macmillan.
  • Skinner, B. F. (1957). Verbal behavior. Englewood Cliffs, NJ: Prentice Hall.
  • Thorndike, E. L. (1901). Animal intelligence: An experimental study of the associative processes in animals. Psychological Review Monograph Supplement, 2, 1-109.
  • Keller and Marian Breland (1961), "The Misbehavior of Organisms," American Psychologist.

External links

Fr:Conditionnement opérant he:התניה אופרנטית pl:Warunkowanie instrumentalne zh:操作条件反射