Friday, November 17, 2023

DSTD Notes

 Don’t Shoot The Dog Notes

I recently read Karen Pryor's Don't Shoot The Dog, a classic dog training book. I got a lot of value out of its fresh perspective and keen insight into practical behavior science.

- - -


Shaping Behavior And The Primacy of Positive Reinforcement


DSTD was introduced to me in this way: “No one should be allowed to have a kid before they can teach a chicken to dance.” A chicken (like dolphins, cats, and small babies) is an animal that cannot be trained by force or by punishment, leaving most to assume that they can't be trained at all. But using alternate methods (exclusively positive reinforcement), you can teach a chicken to dance in less than an hour.

DSTD looks at practical behavioral science from the lens of a pioneering animal trainer using exclusively positive reinforcement. Another trainer put it this way: “I don’t use positive reinforcement because of any moral implication. I use positive reinforcement because it gets results.”

A behavior is always shaped by way of reinforcement. In other words, a reinforcer or training choice that is effective reliably increases the likelihood of that behavior occurring again. If a reinforcer or training choice does not reliably increase the likelihood of a behavior occurring again then it is not effective.

Reinforcement can be positive or negative.

Positive Reinforcers: Praise, a smile, a treat, a nod of approval, petting, a token to be traded for a reward later, shine/pride of accomplishment, etc etc.
Positive reinforcement is reliably effective when the following are sufficient
1) clarity of attribution (the learner clearly understands cause and effect)
2) timing of delivery (the closer to the moment of the behavior the better)
3) the reinforcer itself (if it’s a crowded bar you have to tip in $20s)
Assuming the behavior itself is well within the realms of possible, if the behavior frequency does not change after reinforcement then those attributes need tweaked.

Negative reinforcement is NOT reliable. Negative reinforcement will usually stop a behavior in the moment, but will not shape that correction for the future. No one that gets a speeding ticket won’t speed again; they just learn to scan for cops. Punishment only teaches to avoid the punisher. Contrast that with if there was such a thing as a “observance ticket” where you got a cash reward for following the speed limit. We would all cruise around at exactly the speed limit looking for cops. Positive reinforcement teaches behavior.

In general, you want to positively reinforce anything that you want to see more often and let things you want to see less often extinguish by themselves in the light of the momentum of rapidly accumulating positive change.

Using nigh-exclusive positive reinforcement to influence behavior change is probably the single most important sub-skill to coaching, even and especially especially especially self-coaching.



The Nature Of an Effective Training Relationship

The objective of an effective training relationship or culture is the progressive realization of a worthwhile goal by way of encouraging full engagement, creative experimentation, and a rapid and accurate feedback loop.

Unfortunately, the teacher/student relationships that we are accustomed to are typically
a) hierarchical (what the teacher wants when, where, and how they want it)
b) pursuant of results by way of avoiding mistakes
c) one program fits all

Insofar as the culture of the learning space demonstrates those traits, fruitful and sustainable learning is not feasible.

In contrast, effective training is a dynamic dialogue. An effective teacher/learner relationship is
a) mutually creative
b) experimental by nature
c) highly sensitized to the individuals involved

When the learning space slips out of those traits, it is important to (re)establish them, even to the extent of treating it like a fresh start.



On Praise and Stinginess

Some animal trainers reduce the amount of food they give to their animals in order to artificially increase the value of rewards. It doesn’t work. They end up with smaller animals that are worse equipped to learn. Despite the intention, stinginess is punishment. Don’t be stingy. Stinginess breeds contempt.

But neither do we want to be inaccurate. It’s important to be able to make accurate distinctions between care, encouragement, and praise. Love, attention, and care should be freely and liberally given. Encouragement when the learner is on the right track. Praise is either earned or meaningless. Inaccurate reinforcement is inaccurate feedback. That being said, accurate appreciation requires a high degree of presence. Having nothing at all to praise (i.e no positive feedback to give) probably means the trainer isn’t paying attention. Any honest progress or honest effort merits praise. A “not enough” attitude is stinginess.

The purpose of training a behavior with reinforcement is to establish a behavior (or an aspect of a behavior), not to sustain something already well-trained. Once it is well-trained, reinforcement of a behavior should promptly transition to random delivery instead of consistent delivery.
You can think about it like this:
Encouragement is like saying “Keep going! You are on the right track! ”
Praise/positive reinforcement is accurate feedback. “Good job!”
Random positive reinforcement is occasionally saying “Great!" as a way of communicating "Hey, thank you for continuing to stay engaged.”
If the reinforcement peters out completely, the behavior will peter out too.

In a training context, you can consider a behavior well-trained only when it is under complete stimulus control.



4 Targets for Complete Stimulus Control

1. The behavior always occurs immediately on cue.
2. The behavior never occurs in absence of cue.
3. The behavior never occurs on a different cue.
4. No other behavior happens on this cue.

Each target is distinct and can (read: should) be considered and trained independently. Systematically shaping these four targets is an incredibly efficient way to train.



The 10 Laws of Shaping Behavior


1. Raise criteria only in increments small enough that the learner always has a realistic chance for reinforcement.
Within reach means well-within the range already demonstrated in training.
Every time your change criteria you are changing the rules of the game and the learner has to feel that it is still fair and good to play.
Easy progress is fast progress; let the learner set the pace. Their body is the best judge of what “enough” looks like right now.

2. Train one aspect of a particular behavior at a time; do not shape for two criteria simultaneously.

You can train multiple things per session, just not at the same moment.
Training direction then distance is faster and more effective than training direction and distance.
Usually ineffective practice happens because you're training multiple things simultaneously, coming up with bad form/habits, and ingraining those (and bad form is a pain in the ass to solve later).

3. During shaping, put the current level of response onto a random schedule of reinforcement before adding or raising the criteria.
When the 2ft jump is reliable (a new baseline), you can stop reinforcing it every time and instead reinforce it randomly.
After the learner is over the initial shock of rule change they usually show more vigor as if to say "look at me!" and that dovetails with reinforcing all jumps at next goal (2.5ft) until that constitutes a new baseline (which could be right away, who knows?).
***Raising criteria is about making it clear that the rules of the game are changeable, but always in a way that is understood, fair, fun, and builds learning momentum.***

4. When introducing a new criterion or aspect, temporarily relax the old ones.
New criteria, aspects, environment, etc will cause mistakes to pop up. They will go away by themselves as the learner adjusts. If you draw attention to these mistakes with reprimand the mistakes might stick around.

5. Stay ahead of your learner: plan the program completely so that if the subject makes sudden progress you are aware of what to reinforce next.
Breakthroughs can happen at any time. Insight can happen at any time. The trainer being unprepared for breakthrough and thus holding back the learner's progression feels like a punishment.

6. Don't change trainers midstream, one shaper per behavior.

Changing trainers for the same behavior constitutes environment change.

7. If one shaping procedure is not eliciting progress, find another. Be creative.

The same method won't work equally or maybe at all per learner.
If a particular method isn't getting results, more of the same will simply not work, though it can wreck the relationship.

8. Gratuitous interruption constitutes punishment.

A training session is an intimate, vulnerable space. The trainer promises to give their attention, care, expertise, and reinforcement. Unexpectedly revoking those at no fault of the learner is a breach of trust.

9 If behavior deteriorates, quickly review the entire shaping process with a series of easily earned reinforcers.
Often it's not even helpful to figure out why behavior deteriorates when you can just review from the beginning and rapidly build momentum back up that way.

10. If at all possible, end each session on a high note.
Any real progress is a potential ending place. It's way way way better to have multiple short sessions each ending on a high than one longer session that contained multiple highs. It's better to introduce a break for the highlight to get internalized than to introduce the stress of needing to reproduce it right away.
For this reason it's important not to end a session with something hard or new. If a highlight didn't present itself earlier, end with something familiar and easy.






8 Ways to Get Rid of a Behavior

If our objective is to get results, then we want to normalize using 5, 6, 7, and 8, and normalize avoiding 1, 2, and 3.

1. Elimination
Restrain or eliminate the subject or their ability to perform the behavior. Elimination always gets a result BUT no learning occurs.
Ex: If you kill the dog it will stop barking 100% of the time. A kid sent to their room cannot misbehave at the table.

2. Punishment (Negative Experience After)
Negative reinforcement that happens after the behavior.
We intuitively think that escalating punishment will make it work, which is not at all true.
To the punisher, punishment feels great, like asserting dominance. To the subject, punishment only teaches to avoid the punisher.
Punishment can work well when it is extremely rare and clearly understood as a just consequence of a specific behavior. Otherwise it's just experienced as abuse.
Ex: Yelling. Reprimand. Threats. Criticism. Shaming. Hurt. Deprivation. Revenge.

3. Negative Experience During
Any negative experience that can be stopped by changing the behavior.
Like punishment, negative experience during a behavior only teaches avoidance, not correction. Rather than to correct the behavior, the learner will deceive or avoid the punisher or the situation.
As a parenting or self-talk style, negative reinforcement invariably breeds inhibition. Negative reinforcement that doesn't stop with a change in behavior teaches helplessness.
ex: Seatbelt chime stops when you plug in your seatbelt. Nagging. Begging. Disapproving glances/words/tone.

4. Extinction
Behavior that produces no results at all (good or bad) gradually diminishes.
If you don't respond to complaining/whining/teasing it goes away. But if you ever respond to whining, it gets strongly reinforced. This can be tricky with whining or fighting because we usually want to respond with counterarguments, not recognizing that their words are just the means of expression for a bad behavior; whining or fighting.
Similarly, focusing on mistakes/errors will often actually magnify their impact whereas focusing on repeating successes will result in the mistake working itself out over time. The baseball team that only studies replicating its successes gets much better but the baseball team that only studies avoiding its mistakes gets worse.

5. Train an Incompatible Behavior
Often the easiest way to change our own behavior.
This can look like doing a totally different activity, intentionally breaking the procedure chain, or Old Way -> New Way

6. Put The Behavior On Cue (and then deny or compartmentalize the cue)
If a behavior is reinforced for being on a cue it will naturally extinguish elsewhere— especially when it is not reinforced elsewhere.

7. Reinforce Anything Else
Reinforce anything that isn’t the behavior that you want to see less of.
This is sort of like Training an Incompatible Behavior + Extinction.

8. Change the Motivation
Changing motivation often looks like gaining a more complete, mature, or sympathetic understanding of the situation.
If the cat is well-fed then jumping on the counter is less rewarding. A lot of bad behavior in animals and humans is just disguised hunger, loneliness, or unarticulated fear.