AR AI Assistant for Task Guidance

This first prototype we built for the PTG project focuses on recognizing user actions through the HoloLens camera, providing real-time guidance via a trained, and rendering useful information to assist users as they complete tasks. The AR Task Guidance System is designed to assist users in completing complex tasks by providing real-time, augmented guidance through the Microsoft HoloLens. Our system leverages instructional video data, advanceds, and speech recognition to deliver an interactive and immersive experience.

AR AI Assistant Overview AR AI Assistant Overview

TL;DR

Our AR Task Guidance System consists of three integrated components that work together:

1. Instructional Video Data Collection

Data collection process showing comprehensive gathering of instructional video data for training our models.

The foundation of our AR AI Assistant begins with comprehensive data collection. We gather and annotate a wide range of instructional videos to build robust training datasets for various task scenarios. This process involves:

We record diverse task demonstrations across various environments, then annotate each video with detailed labels for steps, actions, and context. The curated dataset integrates visual, audio, and contextual information to ensure quality and comprehensive understanding for model training.

2. Real-time Action Recognition

Real-time action recognition demo showing the system recognizing user actions and providing immediate feedback through the HoloLens interface.

Our system employs advanced neural networks, specifically CLIP models, to understand and recognize user actions in real-time. The action recognition component processes live video streams from the HoloLens camera and provides immediate feedback:

• CLIP Model Training: Using collected video data for accurate task step understanding and segmentation
• Real-time Processing: Live video streaming and immediate inference on HoloLens platform
• Action Classification: Precise identification of user actions and task progression

3. AR Overlay and Guidance

AR overlay demonstration showing task instructions displayed directly in the user's field of view. The system enables hands-free navigation through different task steps using speech recognition, providing seamless interaction with complex procedures.

Fun selfie