Sugilite

Creating Multimodal Smartphone Automation by Demonstration

Sugilite image

Smartphone Users Generating Intelligent Likeable Interfaces Through Examples.

SUGILITE is a new programming-by-demonstration (PBD) system that enables users to create automation on smartphones. SUGILITE uses Android's accessibility API to support automating arbitrary tasks in any Android app (or even across multiple apps). When the user gives verbal commands that SUGILITE does not know how to execute, the user can demonstrate by directly manipulating the regular apps' user interface. By leveraging the verbal instructions, the demonstrated procedures, and the apps' UI hierarchy structures, SUGILITE can automatically generalize the script from the recorded actions, so SUGILITE learns how to perform tasks with different variations and parameters from a single demonstration. Extensive error handling and context checking support forking the script when new situations are encountered, and provide robustness if the apps change their user interface. Our lab study suggests that users with little or no programming knowledge can successfully automate smartphone tasks using SUGILITE.

The follow-up system EPIDOSITE (Enabling Programming of IoT Devices On Smartphone Interfaces for The End-users) extends SUGILITE to support users to leverage smartphones as hubs for smart home and IoT automation, and to create automation for smart home and IoT devices by demonstrating the desired behaviors through directly manipulating the corresponding smartphone apps. EPIDOSITE also supports using the smartphone app usage context and external web services as triggers and data for automation, enabling the creation of highly context-aware smart home and IoT applications.

APPINITE (Automation Programming on Phone Interfaces using Natural-language Instructions with Task Examples) introduces a multimodal interface for SUGILITE, with which users can specify data descriptions verbally using natural language instructions. APPINITE guides users to describe their intentions for the demonstrated actions through mixed-initiative conversations. APPINITE can then construct data descriptions for these actions from the natural language instructions.

PUMICE (Programming in a User-friendly Multimodal Interface through Conversations and Examples) features a new multimodal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. PUMICE enables users to teach the SUGILITE agent new concepts (e.g., hot) in conditionals for task automation.

PINALITE (Personal Information Nicely Anonymized Leveraging Interface Trace Examples) is a privacy-preserving sharing mechanism in SUGILITE for GUI-based PBD scripts that can identify and obfuscate personal private information embedded in scripts, while maintaining the transparency, readability, robustness, extensibility, and generalizability of the original scripts.

SOVITE (Ssystem for Optimizing Voice Interfaces to Tackle Errors) presents a multi-modal error handling and repairing approach for SUGILITE and other task-oriented conversational agents that helps users discover, identify the causes of, and recover from conversational breakdowns caused by natural language understanding errors using existing mobile app GUIs for grounding.

Screen2Vec is a new self-supervised technique for generating semantic embeddings of GUI screens and components that encode their textual contents, visual design, and layout patterns, and app meta-data without requiring manual data annotation. Screen2Vec can be potentially useful for generalizing user-taught SUGILITE task procedures across different apps in similar task domains.

Available!

Publications

pdf
github

Toby Jia-Jun Li, Lindsay Popowski, Tom M. Mitchell, and Brad A. Myers. "Screen2Vec: Semantic Embedding of GUI Screens and GUI Components", Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (CHI 2021). Virtual Event, May 8 - 13, 2021. Best Paper Honorable Mention Award.

pdf
acm
video
talk

Toby Jia-Jun Li, Jingya Chen, Haijun Xia, Tom M. Mitchell, and Brad A. Myers. "Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs", ACM Symposium on User Interface Software and Technology (UIST'20). Virtual Event, October 20 - 23, 2020. pp. 1094-1107. Best Paper Award.

pdf
acm

Toby Jia-Jun Li, Brandon Canfield, Jingya Chen, and Brad A. Myers, "Privacy-Preserving Script Sharing in GUI-based Programming-by-Demonstration Systems", CSCW'2020, Proc. ACM Hum.-Comput. Interact., Vol. 4, No. CSCW1, Article 60. May 2020. pp. 60:1-60:23.

pdf
acm

Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M. Mitchell, and Brad A. Myers. "PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations," ACM Symposium on User Interface Software and Technology, UIST'19, New Orleans, LA, October 20-23, 2019. pp. 577-589.

pdf
video

Toby Jia-Jun Li, Igor Labutov, Xiaohan Nancy Li, Xiaoyi Zhang, Wenze Shi, Wanling Ding, Tom M. Mitchell, and Brad A. Myers. APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions, Proceedings of the 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2018), Lisbon, Portugal, Oct 01-04, 2018.

Springer

Toby Jia-Jun Li, Igor Labutov, Brad A. Myers, Amos Azaria, Alexander I. Rudnicky, and Tom M. Mitchell. Teaching Agents When They Fail: End User Development in Goal-oriented Conversational Agents, Chapter of Studies in Conversational UX Design, Robert J. Moore, Margaret H. Szymanski, Raphael Arar, Guang-Jie Ren eds. Springer, 2018.

acm
pdf
video

Toby Jia-Jun Li, Amos Azaria, and Brad A. Myers. SUGILITE: Creating Multimodal Smartphone Automation by Demonstration, Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI 2017), Denver, CO, May 06-11, 2017. Best Paper Honorable Mention Award.

Springer
pdf

Toby Jia-Jun Li, Yuanchun Li, Fanglin Chen, and Brad A. Myers. Programming IoT Devices by Demonstration Using Mobile Apps, End-User Development. IS-EUD 2017. Lecture Notes in Computer Science, vol 10303., Eindhoven, The Netherlands, June 13-15, 2017. Best Paper Award.


Copyright © 1996-2020 - Carnegie Mellon University - All Rights Reserved.