Emergent users are fast securing access to Information and Communication Technologies (ICTs) due to massive expansion of mobile telephony in developing regions. They still need appropriate interfaces to perform better with interactive products. Many regard that audio interfaces like IVRs can be the best-fit interfaces for emergent users on account of easier deployments, and a strong presence of vocal culture in developing regions (Barnard, E.; Plauche, M.; Davel, 2008; Plauché & Nallasamy, 2007). IVRs, however, posit serious challenges to their users because of inherent transience and temporality of audio as interaction modality. Tatchell (Tatchell, 1996) finds IVRs based services difficult to learn, easy to forget and confusing. Audio prompts, though may be explicit in directing users, are ephemeral and transient. Users must pay an attentive ear to the audio prompts presenting menu choices and system control features. This puts heavy demands on user’s working memory while navigating through a sequential and hierarchical menu structure. Consequently, user interactions with directed dialog IVRs suffer from ‘poor referability’ and ‘absence of memory aid’. Recent studies with emergent users in focus (Grover, Stewart, & Lubensky, 2009; Anirudha Joshi, Emmadi, et al., 2012) have reconfirmed usability difficulties. A lesser-explored approach aimed at addressing usability barriers with IVRs is the use of coordinated visuals along with audio prompts (Yin & Zhai, 2005, 2006).
The current thesis work is concerned with audio-visual interfaces for emergent users. Audio-visual interfaces refer to traditional IVRs with audio prompts based interactions and supported by the presence of visuals. In such a complementary mix of audio and visuals, audio prompts carry out a directed dialog with the emergent users by (sequentially) presenting them with finite number of choices as menus. We call this directedness. Visuals, on the other hand, depict and possibly highlight the same menu choices on the screen as well. Additionally, the visual menu choices are already visible before these were called in an audio prompt, and stay visible even after they have been called out. This makes audio-visual interactions persistent and free from temporality.
We test audio-visual interfaces with emergent users in two major studies as part of the current research. In our first study, we respond to inconsistency in prior studies with respect to the depth of menu hierarchies in IVRs. Susbequently, we put to test the relevance of existing guidelines in designing audio-visual interfaces. We organize test prototypes with variations in the use of visuals (audio-visual vs. audio-only), menu depths (deep vs. late) and menu positions (early vs. late). With “use of visual” being the between group variable, we organize four different combinations of menu depths and menu positions as test tasks namely, shallow-early (SE), shallow-late (SL), deep-early (DE) and deep-late (DL). Our findings demonstrate that emergent users perform better with significant differences in task success for all task types with audio-visual interfaces than with audio-only interfaces. Emergent users perform better with significant differences in task success with deep menus than with shallow menu in audio-visual interfaces. For audio-only interfaces, this difference between deep menus and shallow menus is marginally difference. This is contrary to prior studies which show that shallow menus do better than deeper menus in IVRs with traditional users (Cohen, 2004; Commarford, Lewis, Smither, & Gentzler, 2008; Suhm, Freeman, & Getty, 2001) and in graphical user interfaces for emergent users (Indrani Medhi, Toyama, Joshi, Athavankar, & Cutrell, 2013). The directedness of audio seems to be helping emergent users navigate hierarchies better than graphical user interfaces. Users are also seen making menu selection independent of the position of the menu items.
In the second study, we compare an audio-visual interface with an audio-only interface and a graphical user interface for both transactional and informational tasks. We organize a careful use of directedness and persistence in our test prototypes. Audio-only and audio-visual interfaces have the same audio prompts, while the audio-visual and graphical user interfaces have the similar visuals. Our results demonstrate that audio-visual interfaces are a good balance between an audio-only interface and a graphical user interface for emergent users. Emergent users exhibit significant greater task success with audio-visual interfaces than with both graphical user interface and audio-only interface for both transactional and informational tasks. As expected, audio-visual interfaces are not as fast as graphical user interfaces. But our research demonstrates that audio-visual interfaces are significantly faster than audio-only interface. In addition, emergent users prefer using audio-visual interfaces over audio-only and graphical user interface with significantly higher SUS scores.
This thesis also brings implications of our findings. With deep menus doing better than shallow menus and a menu selection independent of menu positions, interface designers can utilize good number of menu items in a single menu of audio-visual interfaces. Deeper menu hierarchies with longer menus could enable emergent users to use more complex interactive products than what was previously possible. Further, interface designers for emergent users may, at times, want to further reduce the task times for emergent users using audio-visual interfaces. This could be done by switching the directed dialog “on/off”, or by providing audio prompts only if the user is unable to proceed. Subsequently with any such functionality, audio-visual interfaces may exhibit improved task time for frequent (returning) users, while staying valuable for first time users.