System Architecture

A. System Architechture

A visualized high-level architecture of our solution:

B. Users & Frontend

Based on our personas and requirements gathering, we have identified two distinct types of users: staff/admin and the general public.

The general public interacts with kiosk interfaces directly, and we provide them with a WebExtension front-end built with React and JavaScript. This front-end can be hooked into any pre-existing kiosk interface as a sidebar to display essential information from the backend. This includes a live preview of the user’s hand, making it easier to perform gestures, live swipe detection boundaries, gesture sensitivity settings, and general status information.

For staff/admin, we provide an admin control panel front-end built using the Microsoft Foundation Class library in C++. This control panel sets the initial configuration settings of the program, such as gesture sensitivity, camera selection, hand detection, crowd rejection confidence, and more. Staff/admin can easily launch or exit the backend processes with the click of a button.

C. Backend & Data Store

The backend consists of an intermediary server written in Python that runs on the machine’s localhost and the UCL MotionInput Python backend. MotionInput detects gestures and simulates controls based on those gestures. It reads the camera input, identifies and tracks activated gestures on each frame, and performs the relevant keyboard control based on those gestures.

The intermediary server sits between MotionInput and the WebExtension frontend and facilitates bidirectional communication between the two via RESTful JSON encoded messages. It communicates with MotionInput using the ZeroMQ messaging library and communicates with the WebExtension frontend using WebSockets.

The data store consists of JSON config files and pre-trained machine learning models used by the MotionInput backend.

MotionInput Architecture

D. MotionInput Backend Architecture

Although the UCL MotionInput backend is a pre-existing solution, we have made several changes to it for our solution. The general architecture of MotionInput can be seen in the following diagram, with the parts circled in red being areas that we extensively modified or added new functionality to.

MotionInput System Architecture

E. Changes to MotionInput

We made several significant changes to the UCL MotionInput backend to enhance its functionality. These changes can be broadly categorised as follows:

MotionInputAPI: We created an API for a ZeroMQ client to facilitate communication with the intermediary server. This client runs in a dedicated thread and provides a high-level API for MotionInput modules. This abstraction simplifies the complexities of the client and enables any module to send data to the WebExtension frontend through the intermediary server.
‍
HandLandmarkDetector: It uses Mediapipe to analyse a frame and detect hand landmarks. We created a hand detection heuristic function that detects the best hand in the camera frame based on factors such as distance from the camera and proximity to the centre of the frame. This feature is essential for rejecting extraneous hands in a public setting and only tracking the controlling user’s hand.
‍
ModeController: It is used to switch between the modes of interaction, thus switching the sets of events in the model. We created two modes for touchless kiosk navigation: touchless kiosk navigation with the right hand and touchless kiosk navigation with the left hand.
‍
Config: We added configuration settings to control aspects such as spawning the WebSocket thread, hiding the camera preview window, and adjusting swipe sensitivity for touchless kiosk navigation.
‍
GestureEvent: It is the action that occurs after a gesture is performed. We added a custom gesture event that allows users to control kiosk interfaces with touchless swipe gestures.

System architecture and its components

System Design