Information sources and technical decisions

Research

Design & Technology Review

A. Gesture Designs: Finding Ideas
During our research in term 1, we focused on laying the foundation of our project by creating the initial UI design. Our main objective was to study the UI design of existing kiosk machines and propose ways to replace corresponding touch actions with gestures.

We conducted a literature review to study the physical and UI structures of common kiosks used for checkout (such as those found at TESCO) and ordering (such as those at McDonald's and UCL FOLD Pizza). Additionally, we analyzed touchless kiosk projects related to our research topic. After extensive research, we shortlisted three gesture designs:

1. Face/Head Gestures: use the head to navigate
2. Virtual Mouse: use hands to control a virtual mouse
3. Swipe-based Gestures: use hands to swipe through various buttons
B. Gesture Designs: Filtering Ideas
We initially considered the head gesture design, but we rejected it because it was unconventional and awkward, making it less user-friendly.

The second option was similar to the example in our related project, where a virtual mouse was bounded by the hand position. However, we found that this design was not friendly as keeping hands in the air requires more effort than directly interacting with a touchscreen. Thus, it was rejected. Moreover, In a real scenario, where limitations such as camera specifications and user distance, are beyond our control, this approach would have made our project more error-prone.

Ultimately, we decided to go ahead with the third option where we navigate through options by performing swipe gestures of a hand. This approach provided a good balance between the usability and reliability of our project.
A. Swipe Gesture: Problems with the Old Version
There was already a Mr_Swipe gesture in the MotionInput database but our tests found it to be glitchy.

Sometimes, when a user conducted a swipe move, it triggered multiple events in a row, resulting in counterintuitive feedback. It used Python lists for complex calculations which made it extremely slow and caused tracking issues which added to the problem.

We debated whether to stick with the old gesture trigger or develop a new one. While creating a new gesture trigger would have resolved most of our issues, it would have been too time-consuming.

However, the swipe was an integral gesture of our project, so we finally decided to develop a more accurate and robust swipe.
B. Swipe Gesture: Creating a New Version
Our team discussed various methods to develop a swipe gesture. While a swipe on a phone is straightforward, touchless swipes present unique challenges. Specifically, we had to figure out how to detect whether a user who swiped right intended to swipe left or right again when they returned.

To address this issue, we surveyed members of other teams and asked them to imagine standing in front of a touchless kiosk and perform a swipe gesture. Through careful observation and feedback, we were able to develop a design that would work effectively.

We ultimately decided to set the centre of the frame as a reference point and defined our swipe gesture as follows: A user would trigger a swipe by starting from the centre and moving their hands towards the edge of the screen. Conversely, nothing would occur when a user's hand moved from the edge of the screen back towards the centre.
Backend Frameworks:
MediaPipe Hands
MediaPipe is an open-source framework developed by Google that provides a set of customizable building blocks for building machine learning (ML) pipelines to process audio and video data [2].

MotionInput uses MediaPipe Hands to identify hand movements. MediaPipe Hands is a machine-learning model that can detect 21 3D landmarks on one or both hands from a single image. The decision to include MediaPipe in MotionInput was made by previous version developers, who found it to be a suitable choice for several reasons, such as its speed, accuracy, stability, etc.
Frontend Frameworks:
ReactJS (Web Extension)
Our team was tasked with creating a web extension that shows an animated, live video preview of the user's hand movements. After researching different frontend frameworks, we decided to build the extension with ReactJS due to its ability to efficiently and seamlessly update UI elements in real time.

Unlike other frameworks like AngularJS and Bootstrap, ReactJS uses a virtual DOM and reconciliation technique, which minimizes the number of DOM manipulations needed to update the UI [3]. This results in faster and smoother rendering of UI elements, making ReactJS a perfect choice for our project.
Frontend Frameworks:
MFC (App Launcher)
Our project required us to build a launcher using MFC, which would launch both the motion input and intermediary server. Our client requested us to build an MFC as it offered a quick and simple solution for building the launcher.

MFC, or Microsoft Foundation Class Library, is a collection of pre-built classes and functions that simplify the process of developing Windows applications [4]. This made MFC an attractive option for our project, as we needed to create a launcher quickly and without spending excessive time on low-level implementation details.
Libraries:
ZeroMQ (Messaging Library)
In order to ensure the resilience of our kiosk system, we needed a mechanism to restart the MotionInput backend in case of a crash. To accomplish this, we set up an intermediary server that maintains contact between the backend and frontend. For messaging between the backend and the intermediary server, we opted for the ZeroMQ messaging library.

ZeroMQ was a natural choice for our needs due to its lightweight and efficient design, support for multiple messaging patterns, cross-platform compatibility, and robustness [5]. While there are alternatives to ZeroMQ, such as RabbitMQ and Apache Kafka, we chose ZeroMQ due to its low latency and high throughput, which is essential for real-time applications like our kiosk system. Additionally, ZeroMQ's simple API and ease of use made it a better fit for our project than other messaging libraries that may have more complex APIs or require significant setup and configuration.