Alex's Puff Stuff

Friday, November 17, 2023

DSTD Notes

Don’t Shoot The Dog Notes

I recently read Karen Pryor's Don't Shoot The Dog, a classic dog training book. I got a lot of value out of its fresh perspective and keen insight into practical behavior science.

- - -

Shaping Behavior And The Primacy of Positive Reinforcement

DSTD was introduced to me in this way: “No one should be allowed to have a kid before they can teach a chicken to dance.” A chicken (like dolphins, cats, and small babies) is an animal that cannot be trained by force or by punishment, leaving most to assume that they can't be trained at all. But using alternate methods (exclusively positive reinforcement), you can teach a chicken to dance in less than an hour.

DSTD looks at practical behavioral science from the lens of a pioneering animal trainer using exclusively positive reinforcement. Another trainer put it this way: “I don’t use positive reinforcement because of any moral implication. I use positive reinforcement because it gets results.”

A behavior is always shaped by way of reinforcement. In other words, a reinforcer or training choice that is effective reliably increases the likelihood of that behavior occurring again. If a reinforcer or training choice does not reliably increase the likelihood of a behavior occurring again then it is not effective.

Reinforcement can be positive or negative.

Positive Reinforcers: Praise, a smile, a treat, a nod of approval, petting, a token to be traded for a reward later, shine/pride of accomplishment, etc etc.
Positive reinforcement is reliably effective when the following are sufficient
1) clarity of attribution (the learner clearly understands cause and effect)
2) timing of delivery (the closer to the moment of the behavior the better)
3) the reinforcer itself (if it’s a crowded bar you have to tip in $20s)
Assuming the behavior itself is well within the realms of possible, if the behavior frequency does not change after reinforcement then those attributes need tweaked.

Negative reinforcement is NOT reliable. Negative reinforcement will usually stop a behavior in the moment, but will not shape that correction for the future. No one that gets a speeding ticket won’t speed again; they just learn to scan for cops. Punishment only teaches to avoid the punisher. Contrast that with if there was such a thing as a “observance ticket” where you got a cash reward for following the speed limit. We would all cruise around at exactly the speed limit looking for cops. Positive reinforcement teaches behavior.

In general, you want to positively reinforce anything that you want to see more often and let things you want to see less often extinguish by themselves in the light of the momentum of rapidly accumulating positive change.

Using nigh-exclusive positive reinforcement to influence behavior change is probably the single most important sub-skill to coaching, even and especially especially especially self-coaching.

The Nature Of an Effective Training Relationship

The objective of an effective training relationship or culture is the progressive realization of a worthwhile goal by way of encouraging full engagement, creative experimentation, and a rapid and accurate feedback loop.

Unfortunately, the teacher/student relationships that we are accustomed to are typically
a) hierarchical (what the teacher wants when, where, and how they want it)
b) pursuant of results by way of avoiding mistakes
c) one program fits all

Insofar as the culture of the learning space demonstrates those traits, fruitful and sustainable learning is not feasible.

In contrast, effective training is a dynamic dialogue. An effective teacher/learner relationship is
a) mutually creative
b) experimental by nature
c) highly sensitized to the individuals involved

When the learning space slips out of those traits, it is important to (re)establish them, even to the extent of treating it like a fresh start.

On Praise and Stinginess

Some animal trainers reduce the amount of food they give to their animals in order to artificially increase the value of rewards. It doesn’t work. They end up with smaller animals that are worse equipped to learn. Despite the intention, stinginess is punishment. Don’t be stingy. Stinginess breeds contempt.

But neither do we want to be inaccurate. It’s important to be able to make accurate distinctions between care, encouragement, and praise. Love, attention, and care should be freely and liberally given. Encouragement when the learner is on the right track. Praise is either earned or meaningless. Inaccurate reinforcement is inaccurate feedback. That being said, accurate appreciation requires a high degree of presence. Having nothing at all to praise (i.e no positive feedback to give) probably means the trainer isn’t paying attention. Any honest progress or honest effort merits praise. A “not enough” attitude is stinginess.

The purpose of training a behavior with reinforcement is to establish a behavior (or an aspect of a behavior), not to sustain something already well-trained. Once it is well-trained, reinforcement of a behavior should promptly transition to random delivery instead of consistent delivery.
You can think about it like this:
Encouragement is like saying “Keep going! You are on the right track! ”
Praise/positive reinforcement is accurate feedback. “Good job!”
Random positive reinforcement is occasionally saying “Great!" as a way of communicating "Hey, thank you for continuing to stay engaged.”
If the reinforcement peters out completely, the behavior will peter out too.

In a training context, you can consider a behavior well-trained only when it is under complete stimulus control.

4 Targets for Complete Stimulus Control

1. The behavior always occurs immediately on cue.
2. The behavior never occurs in absence of cue.
3. The behavior never occurs on a different cue.
4. No other behavior happens on this cue.

Each target is distinct and can (read: should) be considered and trained independently. Systematically shaping these four targets is an incredibly efficient way to train.

The 10 Laws of Shaping Behavior

1. Raise criteria only in increments small enough that the learner always has a realistic chance for reinforcement.
Within reach means well-within the range already demonstrated in training.
Every time your change criteria you are changing the rules of the game and the learner has to feel that it is still fair and good to play.
Easy progress is fast progress; let the learner set the pace. Their body is the best judge of what “enough” looks like right now.

2. Train one aspect of a particular behavior at a time; do not shape for two criteria simultaneously.
You can train multiple things per session, just not at the same moment.
Training direction then distance is faster and more effective than training direction and distance.
Usually ineffective practice happens because you're training multiple things simultaneously, coming up with bad form/habits, and ingraining those (and bad form is a pain in the ass to solve later).

3. During shaping, put the current level of response onto a random schedule of reinforcement before adding or raising the criteria.
When the 2ft jump is reliable (a new baseline), you can stop reinforcing it every time and instead reinforce it randomly.
After the learner is over the initial shock of rule change they usually show more vigor as if to say "look at me!" and that dovetails with reinforcing all jumps at next goal (2.5ft) until that constitutes a new baseline (which could be right away, who knows?).
***Raising criteria is about making it clear that the rules of the game are changeable, but always in a way that is understood, fair, fun, and builds learning momentum.***

4. When introducing a new criterion or aspect, temporarily relax the old ones.
New criteria, aspects, environment, etc will cause mistakes to pop up. They will go away by themselves as the learner adjusts. If you draw attention to these mistakes with reprimand the mistakes might stick around.

5. Stay ahead of your learner: plan the program completely so that if the subject makes sudden progress you are aware of what to reinforce next.
Breakthroughs can happen at any time. Insight can happen at any time. The trainer being unprepared for breakthrough and thus holding back the learner's progression feels like a punishment.

6. Don't change trainers midstream, one shaper per behavior.
Changing trainers for the same behavior constitutes environment change.

7. If one shaping procedure is not eliciting progress, find another. Be creative.
The same method won't work equally or maybe at all per learner.
If a particular method isn't getting results, more of the same will simply not work, though it can wreck the relationship.

8. Gratuitous interruption constitutes punishment.
A training session is an intimate, vulnerable space. The trainer promises to give their attention, care, expertise, and reinforcement. Unexpectedly revoking those at no fault of the learner is a breach of trust.

9 If behavior deteriorates, quickly review the entire shaping process with a series of easily earned reinforcers.
Often it's not even helpful to figure out why behavior deteriorates when you can just review from the beginning and rapidly build momentum back up that way.

10. If at all possible, end each session on a high note.
Any real progress is a potential ending place. It's way way way better to have multiple short sessions each ending on a high than one longer session that contained multiple highs. It's better to introduce a break for the highlight to get internalized than to introduce the stress of needing to reproduce it right away.
For this reason it's important not to end a session with something hard or new. If a highlight didn't present itself earlier, end with something familiar and easy.

8 Ways to Get Rid of a Behavior

If our objective is to get results, then we want to normalize using 5, 6, 7, and 8, and normalize avoiding 1, 2, and 3.

1. Elimination
Restrain or eliminate the subject or their ability to perform the behavior. Elimination always gets a result BUT no learning occurs.
Ex: If you kill the dog it will stop barking 100% of the time. A kid sent to their room cannot misbehave at the table.

2. Punishment (Negative Experience After)
Negative reinforcement that happens after the behavior.
We intuitively think that escalating punishment will make it work, which is not at all true.
To the punisher, punishment feels great, like asserting dominance. To the subject, punishment only teaches to avoid the punisher.
Punishment can work well when it is extremely rare and clearly understood as a just consequence of a specific behavior. Otherwise it's just experienced as abuse.
Ex: Yelling. Reprimand. Threats. Criticism. Shaming. Hurt. Deprivation. Revenge.

3. Negative Experience During
Any negative experience that can be stopped by changing the behavior.
Like punishment, negative experience during a behavior only teaches avoidance, not correction. Rather than to correct the behavior, the learner will deceive or avoid the punisher or the situation.
As a parenting or self-talk style, negative reinforcement invariably breeds inhibition. Negative reinforcement that doesn't stop with a change in behavior teaches helplessness.
ex: Seatbelt chime stops when you plug in your seatbelt. Nagging. Begging. Disapproving glances/words/tone.

4. Extinction
Behavior that produces no results at all (good or bad) gradually diminishes.
If you don't respond to complaining/whining/teasing it goes away. But if you ever respond to whining, it gets strongly reinforced. This can be tricky with whining or fighting because we usually want to respond with counterarguments, not recognizing that their words are just the means of expression for a bad behavior; whining or fighting.
Similarly, focusing on mistakes/errors will often actually magnify their impact whereas focusing on repeating successes will result in the mistake working itself out over time. The baseball team that only studies replicating its successes gets much better but the baseball team that only studies avoiding its mistakes gets worse.

5. Train an Incompatible Behavior
Often the easiest way to change our own behavior.
This can look like doing a totally different activity, intentionally breaking the procedure chain, or Old Way -> New Way

6. Put The Behavior On Cue (and then deny or compartmentalize the cue)
If a behavior is reinforced for being on a cue it will naturally extinguish elsewhere— especially when it is not reinforced elsewhere.

7. Reinforce Anything Else
Reinforce anything that isn’t the behavior that you want to see less of.
This is sort of like Training an Incompatible Behavior + Extinction.

8. Change the Motivation
Changing motivation often looks like gaining a more complete, mature, or sympathetic understanding of the situation.
If the cat is well-fed then jumping on the counter is less rewarding. A lot of bad behavior in animals and humans is just disguised hunger, loneliness, or unarticulated fear.

Tuesday, July 19, 2022

An Inner Scorecard

Jim Loehr talks about Goldman’s dillemma. Starting in the 1970s, Robert Goldman ran a long-term study in which he asked elite athletes the following hypothetical question: “Let’s say I had a magic drug that was so fantastic that if you took it once you would win every competition you would enter from that point on, with the one drawback that it would cause you to die after five years. It is cheating, but I would guarantee that you wouldn’t be caught. Would you take the drug?” Without fail, from the very beginning until he stopped the study in 1995, more than half of respondents answered “Yes, I would take the drug.”

As Jim Loehr was speaking, I expected him to finish the story “without fail, from the very beginning until he stopped the study, the top performers answered ‘no’.” The actual ending caught me off guard.

Why the surprise? What caught me off guard?

If chasing a dream really really matters, if we’re willing to put our whole heart and soul into the pursuit, then we’re relentless. We do whatever it takes. In a whatever-it-takes way the story would make sense. “Whatever it takes” means you take the drug. Maybe the language or framing we use traps us in the perspective we take and ultimately the answer we give.

At the same time, I saw myself myself caught in a similar trap. I’m so used to prioritizing pursuit, to listening to researchers and coaches for hints and clues for higher performance, that had I was waiting with baited breath for Jim Loehr to finish his anecdote with a golden performance-differenciating nugget. Maybe the perspective I take limits the answers I’m ready to hear.

Maybe the gap in expectation I experienced was precisely Loehr’s point: that we’re totally caught up in a dominant culture that could be fairly described as “blind to what it was for in the first place: building character and serving your community.”

We might not always be able to see or articulate it, but we’re never more than one click or thought away from putting the proverbial cart radically before the horse. Putting profit radically before value. Putting results radically before impact. Putting stress (or recovery) radically over development. Etc. Etc.

This isn’t even a moralistic chiding, it’s just a straight description of what's been called scarcity culture. Loehr describes how in sport, in school, in business, in hustle, in drive, in winningness — insofar as pursuit is primarily of "more," it is exactly a race to nowhere.

Loehr asks: can you take character-building as seriously as results pursuit? Could you imagine yourself as a coach, as your own coach, dedicated with the same dogged intensity that we imagine in the greatest coach in his greatest season? Researching, tweaking, optimizing, consulting, creating, augmenting, obsessively working on whatever aspects of the game could make the difference?

Loehr answers: Yes, it is possible to coach ourselves on an inner, character-based scorecard instead of an external results or process-oriented one. The kicker is that we already do. We aren’t immune to the impact and/or lapse of our own character; we’re just habitually blind to it.

A great character scorecard does meaningfully drive (or sabotage) performance. But to jump back to the prioritization of external results is to miss the point. You don’t engage in character work to drive results. You work on the character that can drive results because capacity of character is what matters first and last. For as long as we propagate a culture that doesn’t even know that, even and especially in our own mind/voice/body, we do ourselves and those around us an immense disservice.

Saturday, September 11, 2021

Circadian Clock

The most powerful influence over your circadian clock is light viewing, not sleep times.

how to rapidly set circadian clock:

- Within 30 min of waking up, 2+ min exposure to outdoor sunlight
Optional: add exercise, food or social exposure
- Around sunset, take 2+ min walk and view low solar angle
- After sunset, use dim and horizontal (not overhead) lights

- lowest body temp occurs 2 hours before natural wake time (ex: 2 hours before 7:00AM is 5:00AM). Light exposure in the few hours before lowest body temp (ex: 3:00AM) will make you want to go to sleep later and later on the following days, light exposure soon after lowest body temp (ex: 6:00AM) will make you want to go to sleep earlier on the following days.

- Inconsistent clock/light/activity signals don’t just screw up sleep, it screws up all chemical systems, leading to cascading performance and mood disorder.

- You don't need 7-8hrs of sleep per night so much as 70-80hrs over any 10 day period.

Friday, November 27, 2020

Marth vs Puff: alternate kill setups

vs grounded puff at 45-60%, low dair combos to pivot fsmash.
If puff DI/SDIs too high then you can react with dash JC tipper upsmash.
If puff DIs over the ledge you can FH dair.

From 60%+ a smash attack isn’t guaranteed depending on her DI but you can FH/DJ ff dair then techchase.
On platform stages, over this % range whether she will hit the platform depends on her DI. As such, it's best to assume that she won't and simply switch to a techchase in the (somewhat rare) case that you see she will clearly hit plat.

Fox Ditto upthrow flowchart

this is what I do

0-50%:
any DI: JC grab

50-65%:
No/slight DI: uptilt->hard aerial->techchase (see end of chain grab)

65-85%:
any DI: falling upair. If hard hit, FHDJ upair kills. // If soft hit, grab.

86%+:
upthrow upsmash kills. You might have to shieldstop to get strong hit. (NO DI) (DI IN) (DI OUT)

115+:
any DI: upthrow upair

if at any point they DI off stage:
SH off (double)shine

Optional:
0-20%:
any DI: upsmash->techchase/regrab

30-50%:
slight/no DI: uptilt regrab/nair

40%+:
1 pummel before the throw is fine, 80%+ you can do 2.

50%+:
any DI: upthrow->hard aerial->techchase/regrab

Platform Stuff

0-30%:
50/50 FH upair in place // FH WL down JC grab techrolls.

30%+:
upair missedtech spot // if techroll then dash JC grab.

~60+:
The edgeguard from upthrow FH bair is easier than a plat techchase.

If you grab on side plat:
  no DI: refer to FD chaingrab (50+ FH bair)
  DI off towards the middle: run off->ff->JC upsmash
DI to top plat: run off->DJ upair
  DI off stage:
     <50%: (run off/SD fair->)SH shine // 20% shine 20% fair // 50% shine
     50%+: FH bair

(general note, % ranges for followups off of upair are very similar to off of an upthrow)

If you grab on the top plat:
any DI:upthrow soft nair->fair or shine // (possible DI mixup low% backthrow fair uptilt / high % back throw upsmash)

Marth vs Puff: some Fair combo variations

Soft Fair -> Dair:
From about 30-40%, Marth can true combo FH soft fair to dair. After the dair, if puff DIs to the stage, marth can techchase grab, and then pivot tipper fsmash puff for a kill. That means that with good execution, a FH soft fair at 30% can equal a kill! He can even combo a low tipper fair or a sideB to a soft fair.
After 40%, the dair will no longer connect vs combo DI but will connect if puff DIs badly, so depending on the position it might be a mixup with a raw dair. Puff can get out earlier with good sdi.

Dash SideB:
When puff combo DIs down and away, instead of reaching for a tipper aerial and ending the combo, it is better to dash a small extra distance and then sideB. After the sideB, marth doesn’t get a true followup without bad DI, but he does keep puff within reach of his disjoint and can reaction punish a commitment, including with a soft fair, so the expected value is significantly higher.

The heuristic that comes from this is:
when comboing puff, it is best to repeat SH soft fairs until she either DIs too far away, at which point you dash sideB FH aerial, or until she reaches FH height, at which point you FH soft fair dair.

Resting Gettup Attack OoS

Because puff is light, CC makes you slide too far to rest confidently past low %s.

These numbers assume no shield DI so there is some jump travel time. If you shield DI in so that you are right next to them after shieldstun and can rest them right away it’ll be a little easier than what is listed.

Fox lying on his front
front hitbox ~8f window
back hitbox ~11f window (be careful to wait until he starts to stand up)

Fox lying on his back
front hitbox ~5f window (be careful to wait until he starts to stand up)
back hitbox ~4f window (be careful to wait until he starts to stand up)

Falco lying on his front
front hitbox ~14f window
back hitbox ~8f window

Falco lying on his back
front hitbox ~15f window
back hitbox ~5f window

Marth lying on his front
front hitbox ~8f window
back hitbox ~5f window (can dodge by crouching under it)

Marth lying on his back
front hitbox ~10f window but you have to shieldDI in
back hitbox ~7f window

Falcon lying on his front
front hitbox ~4f window
back hitbox ~9f window

Falcon lying on his back
hit1 have to wait for hit2
hit2 ~5f window

Sheik lying on her front
front hitbox ~11f window
back hitbox ~11f window (wait until she stands up)

Sheik lying on her back
front hitbox ~11f window (wait until she stands up)
back hitbox ~9f window

Peach lying on her front
front hitbox ~7f window (wait until she stands up)
back hitbox ~4f window

Peach lying on her back
front hitbox ~8f window
back hitbox ~10f window

Puff is too short to rest, should grab or dair.