Edge-based Assembly Assistant
Arjun Lakshmipathy, Umaymah Imran
You will create an app that runs on a smartphone, leverages cloudlet resources over a wireless link, and walks users through a physical assembly task. The app will use computer vision to determine when steps of the task have been completed. The phone offloads images to a cloudlet for processing. You can see an example of such an app here. To keep the workload of the project manageable, your app will likely only be able to detect a modest number of steps. However, this project will give you hands-on experience in creating an edge-native application and a good sense for the challenges of getting computer vision models to work well in real systems.
Real-time edge-based vision assistant for blind users
Aditya Shetty, William Borom
Microsoft Seeing AI offers a feature that captures a picture with a phone camera and describes the scene in this photo to a blind person: https://www.microsoft.com/en-us/ai/seeing-ai. You will implement this feature yourself using publicly available machine learning models. You can get creative about which models you chose to use, and how you interpret the model's output. For example, you can use an object detector that will tell you labels and bounding box coordinates for objects in the image. But then how do you summarize the spatial relations of these objects as text? Are there models that can provide more general information (such as "this is a bedroom" rather than just "there is a bed, a dresser, and a lamp")?
Bringing NeRFs into Augmented Reality
Tianshu Huang, Alberto Garcia
Recent advances in neural rendering, such as neural radiance fields (NeRFs), open a promising new avenue to model arbitrary objects and scenes in 3D from a set of calibrated images. NeRFs can faithfully render detailed scenes from any view while simultaneously offering a high degree of compression in terms of storage. However, up until recently, runtime performance was a critical limitation of NeRFs which can take tens of seconds to minutes to render a single image. A recent follow-up work proposed a framework that enables real-time rendering of NeRFs, which expands the frontier of NeRF applications into use cases such as VR/AR.
Your project will be to leverage this work to build an end-to-end pipeline that allows a user to capture a video of an object of interest, train a NeRF at the edge, and overlay the rendered 3D representation onto an AR headset. This project requires proficiency in both systems building and computer vision. Although you are not required to invent novel computer vision algorithms, you should be comfortable working with deep learning code and frameworks such as PyTorch.
CMU History Through Augmented Reality
Qifei Dong, Utkarsh Murarka
This project will explore mobile augmented reality using edge computing. The project team will create an edge-native application for a mobile pedestrian heads-up-display (HUD = e.g., Hololens/Google Glass) user. The application will overlay historical images on a real-world outdoor scene during a walk across the CMU campus (see http://digitalcollections.powerlibrary.org/cdm/landingpage/collection/acamu-cmuhi).
The project goal is to maximize user experience by effectively using off-board low-latency, high-bandwidth edge computing resources. The team will evaluate the impact of different to application partitioning approaches between the device, the edge and the cloud in a real-world edge computing environment (The Living Edge Lab). The project team will need to develop the application in a portable partitioned way and instrument it to measure performance in different partitioning configurations. Proficiency in distributed mobile application development is required. Expertise in interactive augmented reality or gaming applications is desirable.
Network Latency Segmentation
Ishan Darwhekar, Sophie Smith
This project will explore the sources of latency in 4G and 5G mobile networks. The project team will design and implement a framework to measure where latency arises as user traffic passes from the user device to the edge, the cloud and back to the device. The main focus will be on segmenting and measuring the latency introduced by each of the wireless link, the radio access network and the wireless core network. The framework will be built as much as possible from off-the-shelf components (e.g., wireshark) and tested using a real application (e.g., OpenRTIST) in a real environment (The Living Edge Lab). The project will produce the framework, Living Edge Lab measurements using the framework and an analysis of the sources of latency in the Living Edge Lab. The project will require knowledge of mobile network architecture, networking, system measurement and benchmarking.
Low-bandwidth Detection of Human Traffickers from Ads
Bailey Flanigan, Jatin Arora
The explosion of the Internet has presented both an opportunity for human traffickers, but also a new, more publicly available datasource to detect their activity. Research in the past few years on detecting human trafficking has focused on using data from ads to identify suspicious clusters of advertisements that might indicate human trafficking. These ads are hosted on a wide variety of types of websites, including various social media sites, dating sites, and other social networking sites.
This project will design a tool that detects human trafficking-related ads by monitoring advertisements on websites visited by users. It will use visual and text information to create specialized models on each user's mobile devices and classify suspicious ads as belonging to human trafficking with a modest amount of labeling. It will also demonstrate the effectiveness of specialized models over a single mass prediction model.
New OpenScout Cognitive Engines
Siyuan Ji, Yuhang Yao
OpenScout is a pipeline for automated object detection/facial recognition. Android clients send image frames and GPS coordinates from the device to an GPU-enabled edge node where two cognitive engines perform object detection (using objects from the COCO dataset) and face recognition (using either OpenFace or Microsoft's Face Cognitive Service). Results are pushed into an ELK (elasticsearch-logstash-kibana) stack for visualization and analysis. This project would consist of adding new cognitive engines to perform functions like OCR on any text in the image stream or perform pose estimation on the actors in the scene. Microsoft has an OCR cognitive service that could be leveraged, however another OCR framework could be proposed. Pose estimation could be done with OpenPose.
Edge-based Synthetic Scene Transformations
OpenRTiST is a real-time style transfer application that takes image frames from the device and returns images that are stylized to resemble famous works of art which are then displayed on the user's screen. Generative adversarial networks have been used to perform unpaired image to image translation. This project would extend OpenRTiST to create a new cognitive engine using CycleGA or CUT to add new styles via unpaired image to image transfers such as winter to summer or day to night.
Vision-based localization on smartphones
Arjun Ramesh, Tiane Zhu
Google recently released AR-based walking directions. This uses training on Google's streetview image dataset to get more accurate location information than what you would get from just the phone's GPS. The goal of this project is to create a similar system based on Deep Neural Network object detectors and classic SIFT feature extractors, and to compare their effectiveness as a potential basis for an AR application for exploring indoor points of interest (POI). GPS is not available indoors.
The initial goal would be to capture video walking across campus, and collect seasonal image data of places like the fence and other recognizable landmarks and then labeling the various POI with CVAT, the computer vision annotation toolkit. The labeled datasets can then be used to train object recognizers with the OpenTPOD pipeline (tool for painless object detection) as well as the classic SIFT/SURF feature detectors. With this the relative effectiveness of both approaches can be evaluated. How robust are these methods when faces with seasonal or weather changes, or when students repaint the fence? A stretch goal would be to turn this into an Android application, by building on the Gabriel framework which captures image data from Android devices, offloads to the cloud, or a nearby cloudlet, for any heavy computation and returns annotated frames with markers for any POI that were found within the image. An example Gabriel application is OpenRTiST which performs real-time style transfer on the input images.
Towards a More Interactive Virtual Coach for Rehabilitation Exercises
(with Dan Siewiorek & Min Lee)
Andong Jing, Tianhong Yu
A virtual coach that monitors and provides feedback to rehabilitation exercises has the great potential to improve patient compliance with therapy. Development of these systems requires a setup of a motion capture sensor. The goal of this project is to use interaction techniques and edge computing to create a more capable virtual coach to monitor and provide intelligent feedback to patient exercise performance. The coach would use a web camera and/or IMU wearable sensor. The project would include two main parts: (1) Interaction Techniques: Explore more advanced AI techniques to provide transparent feedback to the user; Develop audio or gesture recognition models to control a system; (2) Edge Computing: Create a system to transmit a video/image of an exercise to a server and receive tracked body joints using the OpenPose library; evaluate the effectiveness of edge computing techniques in providing real-time feedback. For the development of this system, we have collected a dataset that includes video recordings of three upper-limb exercises from 15 post-stroke survivors. This project requires proficiency in implementing machine learning models and interactive systems (e.g. using audio or gesture modality).