Hand gesture recognition

Gepubliceerd op 28 augustus 2024 om 21:05

The difference between AI, machine learning and deep learning

The term AI refers to simulation of human intelligence by machines. Machine learning and deep learning are covered under the umbrella term AI. Machine learning software is able to predict outcomes by providing algorithms of older data [2]. Deep learning is a subset of machine learning and uses neural network structure algorithms. It requires more data for an accurate response. ML requires mores human intervention to correct and learn while with DL it learns from its environment and past mistakes which makes it more accurate. In short, ML learns from user input and DL from environmental input. [3]

Types of AI

The different types of AI are divided into task categories. These categories describe what is expected of the software and how it should interact with the user.

Artificial narrow intelligence (ANI): It is designed to learn very specific action and it can’t learn independently. Weak AI facial recognition, language translation, or playing chess.
Artificial general intelligence (AGI): Can perform an intellectual task that humans are able to do. Such as customer service chatbots and voice assistants.
Artificial superintelligence (ASI): A (yet) non-existing form of intelligence that surpasses human intelligence.
Reactive Machine (RM): An AI that is unable to build memory but is able to respond to external stimulation. For example, spam filters and recommendations.
Limited Memory (LM): This AI is able to store knowledge for training purposes.
Theory of Mind: Has cognitive understand and can sense human emotions and performs task accordingly.
Self-Aware: Recognizes emotions of itself and others and thinks in on a human-level.

Benefits

There are a couple main reasons to implement AI in everyday tasks. Once the AI is developed you reduces human error in certain tasks. Sometimes scanning a processing of big datasets would take days or week done by humans, a programmed AI could do it in a fraction of second which is very time efficient and takes away repetitive tasks. Unless you create a Self-aware AI of one with a Theory of mind, AI’s are unbiased and make smart decisions instead of ethical decisions, which in some cases is very practical. [7]

Applications

Nowadays AI is already used in a lot of applications, such as home devices, cars and streaming services. The main categories of applications are for example E-Commerce: Personalized shopping, AI chatbots, fraud prevention. Education: besides using AI chats to create a report, ‘Smart Content’ is a very trendy going subject in the classroom. Automation of Administrative tasks, Voice assistants, Facial recognition.[5] And the list goes on. Hand gesture recognition is known but not used a lot in everyday life. It mainly used in the videogame industry but nearly always with a controller that registers certain motions. [6] Let continue with hand gesture applications.

Hand gesture data sets

This dataset contains thousands of gesture examples and is able to recognize eighty-three different gestures. They are sampled in six different indoor and outdoor scenes which is needed for the AI to recognize different angles and light falls. Other data sets that are created similarly are NVGestures, IPN Hand, HaGRID. The last two iterated have even more samples with also different participants of different ages and ethnicities. [7]

Programming

Various topics are important for programming an AI. Such as a goal, with the goal in mind picking one of the various types of AI is important. Which data is needed to perform the certain chosen task. In this case we need to be able to recognize all types of hand and different hand gestures.

Languages

Nearly every programming language is capable of building an AI system. There are a couple that are easier to pursue than others.

Java is a well-known platform with a large community it is however a very tough language to learn with not a lot of features made. A lot of work is needed to succeed.
Julia is made for high mathematical performance which is good for ML but since its fairly new there isn’t a lot of support which can be challenging.
R is a statistical programming language commonly used by data scientists. There are many packages available, but it can be slow and hard to learn.
JavaScript, popular web development language with a lot of prewritten libraries available for AI and gesture recognition. It is a complex language but with a big community for support.
C++ a very challenging language for AI programming. It is mostly used for game development for its swiftness and power.
Python mostly used for AI because of it easy interface, it has loads of open-source libraries available and a broad community. The language, however, is quite slow.

Algorithm

The algorithm needed for this project is a set of hand gestures with multiple samples. According to a quick google search is a sample rate of 50 to 1000 enough for an approximate accurate algorithm sample. Since it is a vision-based algorithm there need to be accounted for different types of light fall and skin-colors. According to ‘ancestry’ there are approximately 110 different skin tones, 0-100% of light fall and a rotation of 360 degrees of the hand. To get nearly foolproof dataset you need about 3,9 million samples. When AI is applied the rule-of-thumb is: ‘Around ten times more samples than the number of features in your data sheet’ or ‘ten more times the number of the degrees of freedom a model had’. A starting value of a hundred is recommended. In short, every hand gesture needs around a hundred samples in different lighting for it to succeed.

MediaPipe

Google has a full library for different kinds of recognition, the description on the site reads:

‘MediaPipe Solutions provides a suite of libraries and tools for you to quickly apply artificial intelligence and machine learning techniques in your applications’.

MediaPipe offers a lot of functions like the landmark detection which recognizes the palm of your hand. It already possesses a database with samples to do so. The gestures recognition knows four gestures from scratch with around 500 samples each. It has a build in option to ‘teach’ the AI new hand gestures by simply pressing a button as many times as needed.

Programming with MediaPipe

The hand landmark detections detect the dot [0] from which it establishes the whole hand, shown in figure … . It has multiple function such as a minimum and maximum of hands being detected and tracking the hand. It is also able to differentiate the right and left hand.