One of the questions about language processing that we're interested in is how people provide information by emphasizing different words. For example, someone might say "Give me the RED one", and emphasize the word RED to contrast it with a different color. How does the talker indicate this in the way they pronounce the word RED? In particular, we would like to know the acoustic cues that talkers use to indicate aspects of prosody, such as the discourse status of a word (whether it was mentioned previously in a conversation or not).
However, one of the difficulties in answering this question is that we must run a controlled experiment and, at the same time, have people engage in natural conversation. Often, participants are not engaged in the task, and as a result, we do not get robust estimates of the acoustic cues they use to indicate discourse status. To overcome this, we've developed a computer game approach using Minecraft.
Fig. 1. Screenshot from the game showing the two participants. The player on the left has a color sequence they must give to the player on the right to unlock the door.
We've designed maps in Minecraft in which two participants work together to solve puzzles. In the puzzles, one participant has information they have to convey to their partner, and we vary the discourse status of this information. For example, in the figure below the avatars of two participants in different rooms are shown. The participant on the left has a sequence of colors (white, brown, green) that they must give to their partner as a code to unlock the door to the next room. We can then vary whether particular color sequences are new or previously mentioned.
Running the experiment in Minecraft gives us a controlled environment, and it gives participants the ability to have more natural conversations. Using this approach, we've been able to more accurately identify several acoustic features that people use to emphasize words. For example, talkers used word duration to indicate discourse status (new words tended to be longer than previously mentioned words), and these differences were more distinct than in similar, but less engaging, laboratory tasks. Thus, examining conversations in the game gives us a new way of studying spoken language processing and a better understanding of the acoustic cues to prosody.
We are currently running experiments using this approach to study other aspects of conversational speech as well. In particular, we are interested in effects of phonetic convergence, the degree to which conversational partners adjust their speech patterns to sound more like each other. We hope that game-based paradigms, like this one, will help us better understand the processes that cause talkers to converge.