Lily Tomlin’s Ernestine, and bad VUI confirmation/re-prompting

In the current issue of a bridge magazine, the ACBL Bulletin (of all places!), there is an article about badly-designed phone automation.

In her attempt to find an out-of-print book about the game, the caller tries to find the phone number of a bookstore.  The automated system gives her piles of conversational garbage and failed lookups…and then it still gives her the wrong phone number.

An excerpt:

“What city and state?”  “Fort Worth, Texas.”

“That’s Fort Worth, Texas, right?”  “Yes.”

“I’m sorry, I didn’t get that.  That’s Fort Worth, Texas, right?”  “Yes.”

“Okay, do you want residential or business?”  “Business.”

“I’m sorry, I didn’t….”  “BUSINESS.”

“Okay.  Please say the listing you want.”  “Half Price Books.”

“That’s Pentecostal Water of Life, right?”  ??????????? “No, it’s Half Price Books.”

“Please say the listing you want.”  “Okay.  HALF.  PRICE.  BOOKS.”

“What street?  It’s okay to say, ‘I don’t know.’”  “Hulen.”

“Okay.  You don’t know the street.”  (*#&@(*%&@#(*%&@#(*%&

“I’m sorry.  I didn’t get that.  What street again?”  “Hulen.”

“I think you said Cypress.  Is that correct?”  “Yes, Cypress.  That’s it.  Definitely Cypress.”

“I’m sorry.  I didn’t get that.  What street again?”  “I guess you DIDN’T get that, Miss Auto May Shun.  Hulen, but somehow it’s beginning to matter less and less.  I mean, half an hour ago I cared.  But it doesn’t seem important anymore, Hulen.  After all, the book I want is years old.  Bridge changes daily.  The basics, Hulen, might not be relevant in today’s hodgepodge of conventions and intricate twists and turns.  Clever insights are possibly being adopted as we speak, if you can call this speaking.  Hulen.”

“Okay, Hulen.  Is that right?”  “Yes.  Yes.  It is!  YES!”

“Okay, the number is 817-335-3902″.  (And the number was wrong; the author comments further….)

Let’s analyze that a bit.

The fundamental problem here is with the re-prompting strategy.  The computer apologizes “I’m sorry, I didn’t get that…” and the caller gets quickly agitated.  The agitation is not the caller’s fault.  It’s bad design.  The time wasted in the six words “I’m sorry, I didn’t get that”, along with the pretense of compassion, is enough to put a reasonable caller over the cliff on the second occurrence (or earlier!).

The system designer probably intended that the computer sound both deferential and polite, with such a phrase…but it’s counter-productive.  If every unparseable utterance leads to the computer acting sorry, the conversation falls apart.

When the computer tries to be too conversational, the caller (perceiving the thing as sort-of-human and using human speech/conversational patterns) volunteers extra words or sounds that a human would ignore.  The computer can’t ignore those extra sounds.  The human’s utterance is “out of grammar”…and the computer is “sorry” that it couldn’t figure it out.

And then, it immediately spirals into a feedback loop where the computer apologizes.   (But, IT’S NOT HUMAN!!!!!  IT’S NEVER SORRY!!!!!!!!!!  Cats don’t act sorry.  Why should computers?)   The human interrupts again with even more out-of-grammar speech (which is nonsense to the computer), and the conversation is dead.  The task never gets done accurately or efficiently in such situations.  All because the computer pretended to use, and to understand, human speech patterns.

The computer comes across as a bad human who is less capable of intelligent interaction than a one-year-old child.  Consequently, the caller gets understandably upset and then abusive.

And, in popular culture, automation ITSELF turns into the public enemy.  (“You *#$(%*& computer, why couldn’t you *&*#*&% hear me the first *#$*%& time?!?!?!?!?!?!??!?!  No!  Stop!  Stop  *&$*%&#%$ apologizing and shut the *&*%#$% up and listen to me!  No nono no nonono!  Stop!  I called YOU for *&*&#% HELP, because I need HELP, not a run-around…..”)

This bridge magazine article gives a great example.  The computer makes its wrong guesses at the caller’s request, tries to confirm things that are absolutely ludicrous (from the caller’s intelligent point of view), wastes its own speaking turns dwelling on the past, and that’s it. The conversation is dead.

There is one point in the conversation where this caller says something completely sarcastic, and the computer doesn’t get that either.  The computer is not programmed to interpret as “No” the utterance (with its desperately disparaging and sarcastic tone): “Yes, Cypress.  That’s it.  Definitely Cypress.”

It’s not the caller’s fault her emotions got riled up, to that destructive point.  It’s the system’s fault: for encouraging uncooperative behavior by the caller.  The system didn’t keep the necessary control of the conversation.  It would rather be sorry than accurate or efficient, apparently.

And the caller’s perspective is: Couldn’t the company afford to hire an intelligent person to answer the *&*#&%#% phone?  The company would rather waste the customer’s time instead of their own?  The company evidently cares most about keeping their own customers OFF the phone, either by providing a pointless and time-wasting run-around, or by begging the caller actively to go use the web?  That’s the perception.  That’s what bad service says to the customer.  The company would rather stick a clueless and unhelpful computer onto the line than pay an intelligent operator; too bad for the customers.  The company is too busy, or too self-centered, to help real people.

Remember Lily Tomlin’s character of the telephone operator Ernestine (“one ringy dingy, two ringy dingys”; “Is this the party to whom I am speaking?”)?

Ernestine sketch #1

Ernestine sketch #2

Ernestine was snotty, belligerent, self-centered, and presumptuous…but she was still easier to deal with than badly-done automation is.

In automated systems, everything must be done to keep the callers calm and focused on task.  The computer is never sorry.  The computer is never able to filter out extra noise or syllables as well as a toddler does.

More to the point: the confirmation/re-prompting strategy must keep the human saying easily understandable things (or pressing a small selection of buttons!), and volunteering NO extra sounds.

As soon as callers feel badly served, or not listened to, they’ll stop cooperating.  That’s human nature.  The computer doesn’t really care if the caller cooperates or not; it’s just cluelessly following its instructions.

The computer is not sorry.  An unparseable utterance, or even just a bunch of random noise or a digital phone dropout, happened in the “conversation”…and the computer couldn’t act on it.  Fine.  Time moves forward.  “The water is under the bridge.”  The computer must not apologize for being an inadequate conversational partner.  The computer must not speculate on the reason for the error, or blame anyone.  The past is the past.  The error is in the past.

The way out is very easy.  Errors will happen.  The way out is very easy.  The way out is very easy.

Initial statement of the question: “Fort Worth, Texas.  Is that right?”  “wekflkowhfpohf”   (error #1)

“Fort Worth, Texas.  Yes or No?”  “wejljlkwfhHJLhwekfhelsdkFJs”  (error #2, still didn’t get the Yes/No, or “Right”, or synonyms)

“Fort Worth, Texas.  Yes or No?”  “lwjelkfjh”  (error #3: give the caller a way around the side:)

“If that’s the city you want, press 1.  Otherwise, press 9.” 

Whether the error was an unrecognizable utterance or a timeout, the first two re-prompts are simply to say the question again as succinctly and directly as possible.  The third re-prompt gives the caller some unequivocal instructions NOT to speak the answer; for whatever reason, speaking wasn’t working.

The conversation is dead unless the caller can get past this point successfully.  The computer must therefore encourage the caller to cooperate in any way it will be able to understand.  Move forward and try to get a useful answer.  The past is gone.  Steer the future.

Longer junk such as “I’m sorry, I didn’t get that” encourages the caller to jump in with an interruption, miss the instructions again, editorialize, or worse.  It also encourages the caller to try to figure out and (speculatively) fix the CAUSE of the miscommunication, which is a pointless waste of time.  That utterance, whatever it was, is over and gone forever.  Try a new one.  (It also doesn’t work to say ONLY “I’m sorry, I didn’t get that” and not continue to the question; sometimes the error was caused by noise or by a caller interruption, cutting off part of the initial question, and now the context is lost.  The computer didn’t get WHAT?  I didn’t say anything.  Why did the question cut off?  What was the question?  What am I supposed to do now?  Did I kill it?)

Incidentally, this simple re-prompting strategy works well with small children, too.  Just restate the question in a calm and measured manner, making it clear that an answer is required.

“Do you want a banana, an apple, or a cookie?”  “Blah blah blah indecision indecision blah blah.”

“Banana, apple, or cookie?”  “lhklehfawefkjlawj”

“Banana, apple, or cookie?”  “Ummm…cookie!”



About the Author: Brad Lehman is a Professional Services Consultant for Angel. In that role, he designs and develops customized systems to match the client’s business requirements. He brings more than 20 years of professional experience in developing data-driven user interfaces. Brad’s other background is in music, with a doctoral degree in harpsichord performance. Harpsichord players use carefully-controlled silence in the right places to clarify the music.  Using musical listening skills and audio-production skills, Brad brings that same care to the crafting of IVR prompts. The pace of the IVR’s recorded speech has to be exactly right, so the caller will have enough time to understand what was said, and will respond at the right moment with a confident decision.  That’s the obsession that Brad likes to write about in this forum: making it easy for the caller to get through the questions, through principles of well-organized wording and perfect pacing.

Leave a Reply