![]() |
![]() |
Issue of |
![]() |
|
Clifford Nass, seated, is shown with two student researchers, Amy Huang, left, and Seema Swamy, right, who were part of a quarter-long course in which students completed 10 experiments on how people respond to speech interfaces on computers. A web-based research tool, the CSLU Toolkit, developed at the Center for Spoken Language Understanding at Oregon Graduate Institute of Science and Technology (http://cslu.cse.ogi.edu/), made it possible to complete the experiments so quickly Photo by Linda Cicero |
Some of the graduate/undergraduate research teams investigated people's reactions to representations of human faces on computer screens, and one team looked at responses to a computer that "touches" its users through a joystick. The majority of the 11 teams chose to test virtual voices, or the hot new technology known as VUI -- voice user interface. VUIs range from poor-quality machine-generated voices, which many companies use on their telephone answering equipment, to high-quality human-recorded voices, which are more expensive to produce but which are beginning to crop up on commercial websites. These voices can be employed to respond to consumer questions about investing in the stock market or how to set up a recently purchased computer.
"Voice interface technology has improved incredibly rapidly, so that companies like Philips and Microsoft are talking about imbedding speech into almost everything," says Nass, the co-author of The Media Equation, a 1996 book on people's social responses to communicating technology. Venture capitalists, he adds, also have been funding speech interface start-up companies with names like TellMe, BeVocal and Quack. "Yet there has been almost no research on the psychology of design of speech interfaces," he said, which is why about six dozen U.S. and European-based companies sent product designers or researchers to campus in June to hear the students present their sometimes surprising results.
So what will virtual people be like? First, their gender and ethnic "background" is not likely to be accidental or even representative of the human population. That is because designers are not likely to ignore what the students learned -- that gender and ethnic stereotyping, often subconscious, is pervasive when people encounter voice interfaces.
The experimenters also found they could manipulate people's attitudes toward the content of messages by changing the emotional tone of voice, as well as physical parameters such as pitch and speed.
Voice interfaces, the students also found, may not always be preferable to the text interfaces to which computer users are now accustomed.
Gender/ethnic stereotyping
The research subjects reacted more positively to virtual male voices than to virtual female voices in several experiments, as the researchers predicted.
![]() |
|
When subjects saw these synthetic faces on computer screens coupled with a human-sounding voice, they gave less personal information about themselves than when they just heard the voice. They revealed the most information, however, to computers that simply presented questions in text. The experiment suggests that the more human-like the interface, the greater desire humans have to manage themselves. Courtesy U.C. Santa Cruz |
"Gender is the first social attribute people recognize in a human voice, and it triggers stereotypical reactions, so that male voices are perceived as more assertive, ambitious and persuasive," said Eun-Ju Lee, a student.
Another student research team looked at what type of voice would prompt people to disclose the most personal information to a computerized interface. They found that their American male subjects -- Stanford students -- were willing to disclose more personal information to user interfaces that spoke in a female, foreign-accented voice -- in this case, Swedish. American females, on the other hand, revealed more personal data to an American-accented female voice than to either a male American voice or Swedish voices of either gender.
Seema Swamy, one of the researchers who conducted the experiment, concluded that men are more likely to disclose personal information to a voice that they feel "socially distant from and are not likely to meet again. Women may feel more comfortable sharing information with someone whom they consider more like themselves."
When voices and faces intimidate
In another experiment, students tried to see how much personal information they could get subjects to disclose when voices were combined with representations of faces on screens. They found that a synthetic face coupled with a human-sounding voice decreased people's willingness to respond "yes" to such invasive questions as "Do you sometimes tell lies if you have to?" Research subjects disclosed the most to text interfaces. Apparently, the more human-like the interface, "the greater desire humans have to manage themselves," said student researcher Li Gong. "The synthetic face made people spend less time answering the questions."
Yet another team found that voices obviously generated by machine probably should not try to claim they are human. Setting up an over-the-phone auction, the researchers tested human-recorded and machine-generated voices, each offering the same items for sale with the same language, except in two styles of grammar.
In one condition, the voice referred to itself with a personal pronoun; for example, "The next item I am offering for sale is a futon." In another condition, the voice used a passive construction -- "The next item for sale is a futon" -- in order to avoid referring to itself as if it were a person. The research subjects perceived the recorded voice using personal pronouns to be the most "sociable and spontaneous" of the four conditions, researcher Francis Lee said, and they found the machine-generated voice using the passive voice to be the most "formal and fair."
"It's nice if they feel good about the voice, but you really want them to buy from an auction site," said researcher Luke Swartz. The research subjects bid higher amounts for the items offered by the "formal and fair" voice, he said.
Can voice trump content?
Several experiments looked at how to manipulate people's perceptions of the content of messages. In one, researchers Kyu Hahn, Sylvia Loveda, Rob Baesman and Sandra Lui took four current events stories that mingled factual and opinionated content and told listeners they were listening to either "news" or an "editorial" on "NetRadio."
For stories labeled as news, the human-recorded voice made the content seem more factual and persuasive than the machine-generated voice. For stories labeled as editorial, the reverse was true. The results suggest that content labeling primes the audience and that listeners are seeking some sort of balance, the researchers said. News may seem "a little boring" in the machine-generated voice, but opinion was probably perceived as less opinionated when spoken by the less human voice.
In a related experiment, another research team found that people ascribed emotions to machine voices; this influenced the credibility of the message. Research subjects liked happy news or movie reviews better when read by a happy voice and bad news and reviews better when read by a sad voice, but they gave more credibility to the report when the voice didn't match the content.
This might pose a difficult trade-off for a website like Charles Schwab.com, said student researcher Michael Somoza. Investors may not like to hear a happy voice reporting a downturn in stock prices, but they would probably believe it more than from a sad voice.
"We think the mismatch conveys a lack of [self-]interest, and therefore people perceive it as less biased," said student researcher Ulla Foehr.
Ethical uses of technology
In presenting their material to industry representatives, several teams concluded that designers should take advantage of existing group stereotypes and that companies seeking more information on their customers should choose interfaces that maximized the amount of information people will give. Those conclusions may raise ethical dilemmas for some.
"I think it's important for everyone to know that stereotypes are happening," Nass said. "One choice [for businesses] is to say 'OK, I'm going to play into the stereotypes' -- making all auto mechanic websites have male voices -- which has the advantage to me of selling more."
On the other hand, it is illegal and presumably culturally unacceptable in the United States for employers of real people to hire only males as auto mechanics. Nass said he often advises industry representatives to expect a backlash that could lead to consumer boycotts or government regulation if they use new media to expand on existing stereotypes.
"The television industry went through this. At first when there were black characters on TV, they were Amos and Andy, and the argument was that people liked seeing African Americans portrayed as idiots. But then the social climate changed, people protested, and the industry created standards boards out of fear they would be regulated if they didn't."
The technology also could be used to undermine social stereotypes, he said, particularly since "the average person has more exposure to most occupations through media than in real life. If it happens in real life, we draw conclusions about media, but the reverse is also true." Children's book authors, he notes, have decided to include some women characters in non-traditional roles and characters from diverse ethnic backgrounds. So far, he said, nearly all human characters on U.S. Internet sites are white, although both genders are represented.
How about using interfaces to manipulate people into giving more personal information?
"It is important for societies to know what is going on within them," Nass said. "We don't need to link answers with particular individuals" about such sensitive subjects as drug use or unsafe sex practices, he said, but the information can be critical for establishing social policies. Businesses, however, frequently want to match information with specific customers to aid marketing.
"This raises a question about informed consent," says Stanford philosophy Professor Debra Satz, who directs the university's Ethics in Society Program. "Do people know what they are contributing to? Is it made explicit so at least they have a choice?"
In the 1970s before VUIs were a reality, Satz points out, MIT artificial intelligence researcher Joseph Weizenbaum conducted an experiment in which people typed information about their personal problems into a computer keyboard and a response came back on the screen. "They thought a therapist was typing on the other side of a wall when, in fact, the machine was programmed to pick up keywords and respond," Satz said.
Participants evaluated the therapist afterward as very good, she said. "Weizenbaum concluded that we can build machines that mimic humans but we should not, because of the ethical deception involved."
Satz said she agreed with
Nass that "it is useful to know the facts -- that
people respond more to one type of voice than the other.
There have been experiments that show people are more
likely to respond to a call for help from a white
person's voice than a black person's voice. Maybe that is
a fact about us, but it should bother us. . . . The
question becomes, what do we want to do with that
knowledge? To what extent do we want to consciously
contribute to existing stereotypes that we know are out
there and that explain a lot of human behavior, or to
what extent do we want to use this knowledge to open more
doors of opportunity?" SR