Towards the design of user-centric strategy recommendation systems for collaborative Human-AI tasks
Artificial Intelligence is being employed by humans to collaboratively solve complicated tasks for search and rescue, manufacturing, etc. Efficient teamwork can be achieved by understanding user preferences and recommending different strategies for solving the particular task to humans. Prior work has focused on personalization of recommendation systems for relatively well-understood tasks in the context of e-commerce or social networks. In this paper, we seek to understand the important factors to consider while designing user-centric strategy recommendation systems for decision-making. We conducted a human-subjects experiment (n=60) for measuring the preferences of users with different personality types towards different strategy recommendation systems. We conducted our experiment across four types of strategy recommendation modalities that have been established in prior work: (1) Single strategy recommendation, (2) Multiple similar recommendations, (3) Multiple diverse recommendations, (4) All possible strategies recommendations. While these strategy recommendation schemes have been explored independently in prior work, our study is novel in that we employ all of them simultaneously and in the context of strategy recommendations, to provide us an in-depth overview of the perception of different strategy recommendation systems. We found that certain personality traits, such as conscientiousness, notably impact the preference towards a particular type of system ( 0.01). Finally, we report an interesting relationship between usability, alignment, and perceived intelligence wherein greater perceived alignment of recommendations with one's own preferences leads to higher perceived intelligence ( 0.01) and higher usability ( 0.01).
Competitive gamification in crowdsourcing-based contextual-aware recommender systems
During the COVID-19 outbreak, crowdsourcing-based context-aware recommender systems (CARS) which capture the real-time context in a contactless manner played an important role in the "new normal". This study investigates whether this approach effectively supports users' decisions during epidemics and how different game designs affect users performing crowdsourcing tasks. This study developed a crowdsourcing-based CARS focusing on restaurant recommendations. We used four conditions (control, self-competitive, social-competitive, and mixed gamification) and conducted a two-week field study involving 68 users. The system provided recommendations based on real-time contexts including restaurants' epidemic status, allowing users to identify suitable restaurants to visit during COVID-19. The result demonstrates the feasibility of crowdsourcing to collect real-time information for recommendations during COVID-19 and reveals that a mixed competitive game design encourages both high- and low-performance users to engage more and that a game design with self-competitive elements motivates users to take on a wider variety of tasks. These findings inform the design of restaurant recommender systems in an epidemic context and serve as a comparison of incentive mechanisms for gamification of self-competition and competition with others.
Automated Strategy Feedback Can Improve the Readability of Physicians' Electronic Communications to Simulated Patients
Modern communication between health care professionals and patients increasingly relies upon secure messages (SMs) exchanged through an electronic patient portal. Despite the convenience of secure messaging, challenges include gaps between physician and patient expertise along with the asynchronous nature of such communication. Importantly, less readable SMs from physicians (e.g., too complicated) may result in patient confusion, non-adherence, and ultimately poorer health outcomes. The current simulation trial synthesizes work on patient-physician electronic communication, message readability assessments, and feedback to explore the potential for automated strategy feedback to improve the readability of physicians' SMs to patients. Within a simulated secure messaging portal featuring multiple simulated patient scenarios, computational algorithms assessed the complexity of SMs written by 67 participating physicians to patients. The messaging portal provided strategy feedback for how physician responses might be improved (e.g., adding details and information to reduce complexity). Analyses of changes in SM complexity revealed that automated strategy feedback indeed helped physicians compose and refine more readable messages. Although the effects for any individual SM were slight, the effects within and across patient scenarios showed trends of decreasing complexity. Physicians appeared to learn how to craft more readable SMs via interactions with the feedback system. Implications for secure messaging systems and physician training are discussed, along with considerations for further investigation of broader physician populations and effects on patient experience.
Virtual nature experiences and mindfulness practices while working from home during COVID-19: Effects on stress, focus, and creativity
In this study, we focus on the impact of daily virtual nature experiences combined with mindfulness practices on remote workers' creativity, stress, and focus over an extended period (9 weeks) during the COVID-19 pandemic. Our results show a positive effect of virtual reality (VR) nature experience on increasing focus and reducing stress. When VR nature and mindfulness practices were combined, we also found an increase in convergent thinking task performance. Our findings demonstrate that 10-minute daily exposure to VR nature and mindfulness practices could compensate for some of the adverse effects of working remotely by improving some aspects of workers' well-being and creativity.
: A Real-World Driven Smartphone Game to Stimulate COVID-19 Awareness
Despite having numerous platforms to promote coronavirus awareness, a part of the population is not well informed about the basic knowledge related to the pandemic. This inspired us to design and implement a free-to-play game, , to help people learn about coronavirus easily yet effectively. A user-centric approach to designing the game has helped us understand the challenges people face and eventually to deliver an interactive game. We conducted an evaluation study across multiple age groups to understand the impact of to enhance COVID-19 learning of the player and to evaluate the quality of the game. The results are obtained by studying the player behavior and performing comparative analysis with Model for the Evaluation of Educational Games (MEEGA+), a standard game evaluation model. Our evaluation shows that there has been an increase in the awareness of players by 53% compared to pre-game awareness. 52.40% of the players found the game to be usable with a good player experience and learning.
Automated affect classification and task difficulty adaptation in a competitive scenario based on physiological linkage: An exploratory study
In competitive and cooperative scenarios, task difficulty should be dynamically adapted to suit people with different abilities. State-of-the-art difficulty adaptation methods for such scenarios are based on task performance, which conveys little information about user-specific factors such as workload. Thus, we present an exploratory study of automated affect recognition and task difficulty adaptation in a competitive scenario based on physiological linkage (covariation of participants' physiological responses). Classification algorithms were developed in an open-loop study where 16 pairs played a competitive game while 5 physiological responses were measured: respiration, skin conductance, electrocardiogram, and 2 facial electromyograms. Physiological and performance data were used to classify four self-reported variables (enjoyment, valence, arousal, perceived difficulty) into two or three classes. The highest classification accuracies were obtained for perceived difficulty: 84.3% for two-class and 60.5% for three-class classification. As a proof of concept, the developed classifiers were used in a small closed-loop study to dynamically adapt game difficulty. While this closed-loop study found no clear advantages of physiology-based adaptation, it demonstrated the technical feasibility of such real-time adaptation. In the long term, physiology-based task adaptation could enhance competition and cooperation in many multi-user settings (e.g., education, manufacturing, exercise).
The effect of challenge-based gamification on learning: An experiment in the context of statistics education
Gamification is increasingly employed in learning environments as a way to increase student motivation and consequent learning outcomes. However, while the research on the effectiveness of gamification in the context of education has been growing, there are blind spots regarding which types of gamification may be suitable for different educational contexts. This study investigates the effects of the challenge-based gamification on learning in the area of statistics education. We developed a gamification approach, called , which is composed of main game design patterns related to the challenge-based gamification; points, levels, challenges and a leaderboard. Having conducted a 2 (read: yes vs. no) x 2 (gamification: yes vs. no) between-subject experiment, we present a quantitative analysis of the performance of 365 students from two different academic majors: Electrical and Computer Engineering (n=279), and Business Administration (n=86). The results of our experiments show that the challenge-based gamification had a positive impact on student learning compared to traditional teaching methods (compared to having no treatment and treatment involving reading exercises). The effect was larger for females or for students at the School of Electrical and Computer Engineering.
Personal information and public health: Design tensions in sharing and monitoring wellbeing in pregnancy
Mobile technologies are valuable tools for the self-report of mental health and wellbeing. These systems pose many unique design challenges which have received considerable attention within HCI, including the engagement of users. However, less attention has been paid to the use of personal devices in public health. Integrating self-reported data within the context of clinical care suggests the need to design interfaces to support data management, sense-making, risk-assessment, feedback and patient-provider relationships. This paper reports on a qualitative design study for the clinical interface of a mobile application for the self-report of psychological wellbeing and depression during pregnancy. We examine the design tensions which arise in managing the expectations and informational needs of pregnant women, midwives, clinical psychologists, GPs and other health professionals with respect to a broad spectrum of wellbeing. We discuss strategies for managing these tensions in the design of technologies required to balance personal information with public health.
Efficacy of personalized models in discriminating high cognitive demand conditions using text-based interactions
Although high cognitive demand conditions can impair psychological, physical, and behavioral processes without appropriate management, current measurement methods are too cumbersome for continuous monitoring of cognitive demand, and do not account for individual differences. This research uses keystroke and linguistic markers of typed text to construct individualized models of cognitive demand response to discriminate high and low cognitive demand conditions, the results of which can have implications for design of cognitive demand monitoring systems for personalized health management. We constructed within-subject models of cognitive demand response for nine participants and one between-subjects model based on 20 participants. The AUCs for personalized models ranged from 0.679 to 0.953 (Mean=0.826, SD=0.085), significantly higher than chance (p < 0.0001) and the 0.714 AUC for the generic model (p=0.002). Although the features in each model were different, the most common features across models are rate of negative emotion, lexical diversity, rate of words over six letters, and word count. These results confirm significant individual differences in cognitive demand response and suggest that those developing measurement methods used in a monitoring system should consider adaptation to individual characteristics. Our research operationalizes the effects of cognitive demand on HCI and contributes a unique combination of text and keystroke features used to detect high cognitive demand situations.
Understanding the Potential of PARO for Healthy Older Adults
As the population ages, there is an increasing need for socio-emotional support for older adults. A potential way to meet this need is through interacting with pet-type robots such as the seal robot, PARO. There was a need to extend research on PARO's potential benefits beyond cognitively impaired and dependently living older adults. Because independently living, cognitively intact older adults may also have socio-emotional needs, the primary goal of this study was to investigate their attitudes, emotions, and engagement with PARO to identify its potential applicability to this demographic. Thirty older adults participated in an interaction period with PARO, and their attitudes and emotions toward PARO were assessed before and after using a multi-method approach. Video of the interaction was coded to determine the types and frequency of engagements participants initiated with PARO. Overall, there were no pre-post interaction differences on these measures. However, semi-structured interviews suggested that these older adults had positive attitudes towards PARO's attributes, thought it would be easy to use, and perceived potential uses for both themselves and others. Participants varied in their frequency of engagement with PARO. A novel finding is that this active engagement frequency uniquely predicted post-interaction period positive affect. This study advances understanding of healthy older adults' attitudes, emotions, and engagement with PARO and of possible ways in which PARO could provide social and emotional support to healthy older adults. The results are informative for future research and design of pet-type robots.
Achieving Interface and Environment Fidelity in the Virtual Basic Laparoscopic Surgical Trainer
Virtual reality trainers are educational tools with great potential for laparoscopic surgery. They can provide basic skills training in a controlled environment and free of risks for patients. They can also offer objective performance assessment without the need for proctors. However, designing effective user interfaces that allow the acquisition of the appropriate technical skills on these systems remains a challenge. This paper aims to examine a process for achieving interface and environment fidelity during the development of the Virtual Basic Laparoscopic Surgical Trainer (VBLaST). Two iterations of the design process were conducted and evaluated. For that purpose, a total of 42 subjects participated in two experimental studies in which two versions of the VBLaST were compared to the accepted standard in the surgical community for training and assessing basic laparoscopic skills in North America, the FLS box-trainer. Participants performed 10 trials of the peg transfer task on each trainer. The assessment of task performance was based on the validated FLS scoring method. Moreover, a subjective evaluation questionnaire was used to assess the fidelity aspects of the VBLaST relative to the FLS trainer. Finally, a focus group session with expert surgeons was conducted as a comparative situated evaluation after the first design iteration. This session aimed to assess the fidelity aspects of the early VBLaST prototype as compared to the FLS trainer. The results indicate that user performance on the earlier version of the VBLaST resulting from the first design iteration was significantly lower than the performance on the standard FLS box-trainer. The comparative situated evaluation with domain experts permitted us to identify some issues related to the visual, haptic and interface fidelity on this early prototype. Results of the second experiment indicate that the performance on the second generation VBLaST was significantly improved as compared to the first generation and not significantly different from that of the standard FLS box-trainer. Furthermore, the subjects rated the fidelity features of the modified VBLaST version higher than the early version. These findings demonstrate the value of the comparative situated evaluation sessions entailing hands on reflection by domain experts to achieve the environment and interface fidelity and training objectives when designing a virtual reality laparoscopic trainer. This suggests that this method could be used successfully in the future to enhance the value of VR systems as an alternative to physical trainers for laparoscopic surgery skills. Some recommendations on how to use this method to achieve the environment and interface fidelity of a VR laparoscopic surgical trainer are identified.
The Design of Hand Gestures for Human-Computer Interaction: Lessons from Sign Language Interpreters
The design and selection of 3D modeled hand gestures for human-computer interaction should follow principles of natural language combined with the need to optimize gesture contrast and recognition. The selection should also consider the discomfort and fatigue associated with distinct hand postures and motions, especially for common commands. Sign language interpreters have extensive and unique experience forming hand gestures and many suffer from hand pain while gesturing. Professional sign language interpreters (N=24) rated discomfort for hand gestures associated with 47 characters and words and 33 hand postures. Clear associations of discomfort with hand postures were identified. In a nominal logistic regression model, high discomfort was associated with gestures requiring a flexed wrist, discordant adjacent fingers, or extended fingers. These and other findings should be considered in the design of hand gestures to optimize the relationship between human cognitive and physical processes and computer gesture recognition systems for human-computer input.
Younger and Older Users' Recognition of Virtual Agent Facial Expressions
As technology advances, robots and virtual agents will be introduced into the home and healthcare settings to assist individuals, both young and old, with everyday living tasks. Understanding how users recognize an agent's social cues is therefore imperative, especially in social interactions. Facial expression, in particular, is one of the most common non-verbal cues used to display and communicate emotion in on-screen agents (Cassell, Sullivan, Prevost, & Churchill, 2000). Age is important to consider because age-related differences in emotion recognition of facial expression have been supported (Ruffman et al., 2008), with older adults showing a deficit for recognition of negative facial expressions. Previous work has shown that younger adults can effectively recognize facial emotions displayed by agents (Bartneck & Reichenbach, 2005; Courgeon et al. 2009; 2011; Breazeal, 2003); however, little research has compared in-depth younger and older adults' ability to label a virtual agent's facial emotions, an import consideration because social agents will be required to interact with users of varying ages. If such age-related differences exist for recognition of facial expressions, we aim to understand if those age-related differences are influenced by the intensity of the emotion, dynamic formation of emotion (i.e., a neutral expression developing into an expression of emotion through motion), or the type of virtual character differing by human-likeness. Study 1 investigated the relationship between age-related differences, the implication of dynamic formation of emotion, and the role of emotion intensity in emotion recognition of the facial expressions of a virtual agent (iCat). Study 2 examined age-related differences in recognition expressed by three types of virtual characters differing by human-likeness (non-humanoid iCat, synthetic human, and human). Study 2 also investigated the role of configural and featural processing as a possible explanation for age-related differences in emotion recognition. First, our findings show age-related differences in the recognition of emotions expressed by a virtual agent, with older adults showing lower recognition for the emotions of anger, disgust, fear, happiness, sadness, and neutral. These age-related difference might be explained by older adults having difficulty discriminating similarity in configural arrangement of facial features for certain emotions; for example, older adults often mislabeled the similar emotions of fear as surprise. Second, our results did not provide evidence for the dynamic formation improving emotion recognition; but, in general, the intensity of the emotion improved recognition. Lastly, we learned that emotion recognition, for older and younger adults, differed by character type, from best to worst: human, synthetic human, and then iCat. Our findings provide guidance for design, as well as the development of a framework of age-related differences in emotion recognition.
In pursuit of rigour and accountability in participatory design
The field of Participatory Design (PD) has greatly diversified and we see a broad spectrum of approaches and methodologies emerging. However, to foster its role in designing future interactive technologies, a discussion about accountability and rigour across this spectrum is needed. Rejecting the traditional, positivistic framework, we take inspiration from related fields such as Design Research and Action Research to develop interpretations of these concepts that are rooted in PD׳s own belief system. We argue that unlike in other fields, accountability and rigour are nuanced concepts that are delivered through debate, critique and reflection. A key prerequisite for having such debates is the availability of a language that allows designers, researchers and practitioners to construct solid arguments about the appropriateness of their stances, choices and judgements. To this end, we propose a "tool-to-think-with" that provides such a language by guiding designers, researchers and practitioners through a process of systematic reflection and critical analysis. The tool proposes four lenses to critically reflect on the nature of a PD effort: , , and . In a subsequent step, the between the revealed features is analysed and shows whether they pull the project in the same direction or work against each other. Regardless of the flavour of PD, we argue that this of features indicates the level of internal rigour of PD work and that the process of reflection and analysis provides the language to argue for it. We envision our tool to be useful at all stages of PD work: in the planning phase, as part of a reflective practice during the work, and as a means to construct knowledge and advance the field after the fact. We ground our theoretical discussions in a specific PD experience, the ECHOES project, to motivate the tool and to illustrate its workings.
Generating Phenotypical Erroneous Human Behavior to Evaluate Human-automation Interaction Using Model Checking
Breakdowns in complex systems often occur as a result of system elements interacting in unanticipated ways. In systems with human operators, human-automation interaction associated with both normative and erroneous human behavior can contribute to such failures. Model-driven design and analysis techniques provide engineers with formal methods tools and techniques capable of evaluating how human behavior can contribute to system failures. This paper presents a novel method for automatically generating task analytic models encompassing both normative and erroneous human behavior from normative task models. The generated erroneous behavior is capable of replicating Hollnagel's zero-order phenotypes of erroneous action for omissions, jumps, repetitions, and intrusions. Multiple phenotypical acts can occur in sequence, thus allowing for the generation of higher order phenotypes. The task behavior model pattern capable of generating erroneous behavior can be integrated into a formal system model so that system safety properties can be formally verified with a model checker. This allows analysts to prove that a human-automation interactive system (as represented by the model) will or will not satisfy safety properties with both normative and generated erroneous human behavior. We present benchmarks related to the size of the statespace and verification time of models to show how the erroneous human behavior generation process scales. We demonstrate the method with a case study: the operation of a radiation therapy machine. A potential problem resulting from a generated erroneous human action is discovered. A design intervention is presented which prevents this problem from occurring. We discuss how our method could be used to evaluate larger applications and recommend future paths of development.
A Taxonomy of Representation Strategies in Iconic Communication
Predicting whether the intended audience will be able to recognize the meaning of an icon or pictograph is not an easy task. Many icon recognition studies have been conducted in the past. However, their findings cannot be generalized to other icons that were not included in the study, which, we argue, is their main limitation. In this paper, we propose a comprehensive taxonomy of icons that is intended to enable the generalization of the findings of recognition studies. To accomplish this, we analyzed a sample of more than eight hundred icons according to three axes: lexical category, semantic category, and representation strategy. Three basic representation strategies were identified: visual similarity; semantic association; and arbitrary convention. These representation strategies are in agreement with the strategies identified in previous taxonomies. However, a greater number of subcategories of these strategies were identified. Our results also indicate that the lexical and semantic attributes of a concept influence the choice of representation strategy.
Easing semantically enriched information retrieval-An interactive semi-automatic annotation system for medical documents
Mapping medical concepts from a terminology system to the concepts in the narrative text of a medical document is necessary to provide semantically accurate information for further processing steps. The MetaMap Transfer (MMTx) program is a semantic annotation system that generates a rough mapping of concepts from the Unified Medical Language System (UMLS) Metathesaurus to free medical text, but this mapping still contains erroneous and ambiguous bits of information. Since manually correcting the mapping is an extremely cumbersome and time-consuming task, we have developed the MapFace editor.The editor provides a convenient way of navigating the annotated information gained from the MMTx output, and enables users to correct this information on both a conceptual and a syntactical level, and thus it greatly facilitates the handling of the MMTx program. Additionally, the editor provides enhanced visualization features to support the correct interpretation of medical concepts within the text. We paid special attention to ensure that the MapFace editor is an intuitive and convenient tool to work with. Therefore, we recently conducted a usability study in order to create a well founded background serving as a starting point for further improvement of the editor's usability.
Cognitive systems engineering: new wine in new bottles
This paper presents an approach to the description and analysis of complex Man-Machine Systems (MMSs) called Cognitive Systems Engineering (CSE). In contrast to traditional approaches to the study of man-machine systems which mainly operate on the physical and physiological level, CSE operates on the level of cognitive functions. Instead of viewing an MMS as decomposable by mechanistic principles, CSE introduces the concept of a cognitive system: an adaptive system which functions using knowledge about itself and the environment in the planning and modification of actions. Operators are generally acknowledged to use a model of the system (machine) with which they work. Similarly, the machine has an image of the operator. The designer of an MMS must recognize this, and strive to obtain a match between the machine's image and the user characteristics on a cognitive level, rather than just on the level of physical functions. This article gives a presentation of what cognitive systems are, and of how CSE can contribute to the design of an MMS, from cognitive task analysis to final evaluation.
The effects of motion and stereopsis on three-dimensional visualization
Previous studies have demonstrated that motion cues combined with stereoscopic viewing can enhance the perception of three-dimensional objects displayed on a two-dimensional computer screen. Using a variant of the mental rotation paradigm, subjects view pairs of object images presented on a computer terminal and judge whether the objects are the same or different. The effects of four variables on the accuracy and speed of decision performances are assessed: stereo vs. mono viewing, controlled vs. uncontrolled object motion, cube vs. sphere construction and wire frame vs. solid surface characteristic. Viewing the objects as three-dimensional images results in more accurate and faster decision performances. Furthermore, accuracy improves although response time increases when subjects control the object motion. Subjects are equally accurate comparing wire frame and solid images, although they take longer comparing wire frame images. The cube-based or sphere-based object construction has no impact on decision accuracy nor response time.
Automation-induced monitoring inefficiency: role of display location
Operators can be poor monitors of automation if they are engaged concurrently in other tasks. However, in previous studies of this phenomenon the automated task was always presented in the periphery, away from the primary manual tasks that were centrally displayed. In this study we examined whether centrally locating an automated task would boost monitoring performance during a flight-simulation task consisting of system monitoring, tracking and fuel resource management sub-tasks. Twelve nonpilot subjects were required to perform the tracking and fuel management tasks manually while watching the automated system monitoring task for occasional failures. The automation reliability was constant at 87.5% for six subjects and variable (alternating between 87.5% and 56.25%) for the other six subjects. Each subject completed four 30 min sessions over a period of 2 days. In each automation reliability condition the automation routine was disabled for the last 20 min of the fourth session in order to simulate catastrophic automation failure (0 % reliability). Monitoring for automation failure was inefficient when automation reliability was constant but not when it varied over time, replicating previous results. Furthermore, there was no evidence of resource or speed accuracy trade-off between tasks. Thus, automation-induced failures of monitoring cannot be prevented by centrally locating the automated task.