Neural Network Based Static Sign Gesture
Recognition System

Parul Chaudhary; Hardeep Singh Ryait

Neural Network Based Static Sign Gesture Recognition System

Parul Chaudhary¹ and Hardeep Singh Ryait²

M.Tech Scholar, Department of ECE, Baba Banda Singh Bahadur Engineering College, Punjab, India
H.O.D, Department of ECE, Baba Banda Singh Bahadur Engineering College, Punjab, India

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

Sign language is natural media of communication for the hearing and speech impaired all over the world This paper presents vision based static sign gesture recognition system using neural network. This system enables deaf people to interact easily and efficiently with normal people. The system firstly convert images of static gestures of American Sign Language into Lab color space where L for lightness and (a, b) for the color-opponent dimensions, from which skin region i.e. hand is segmented using thresholding technique. The region of interest (hand) is cropped and converted into binary image for feature extraction. Then height, area, centroid, and distance of the centroid from the origin (top-left corner) of the image are used as features. Finally each set of feature vector is used to train a used to train a feed-forward back propagation network. Experimental results showed successful recognition of static sign gestures with an average recognition accuracy of 85 % on a typical set of test images.

Keywords

American Sign Language, Binary image, Feed-forward back propagation network, Lab color space, Thresholding technique.

INTRODUCTION

COMMUNICATION is a fundamental requirement for survival and interaction provides a mean to communicate. Naturally, different communication ways are used for interaction such as language, eyes, body movement, facial expression, hand gesture and postures. A gesture is a form of non-verbal communication made with a part of the body and used instead of verbal communication (or in combination with it). A sign language is a language which uses gestures instead of sound to convey meaning combining hand-shapes, orientation and movement of the hands, arms or body, facial expressions and lip-patterns. Sign language is a visual language and consists of 3 major components [2]: finger-spelling: used to spell words letter by letter, word level sign vocabulary: used for the majority of communication, non-manual features: facial expressions and tongue, mouth and body position. Sign language is one form of communication for the hearing and speech impaired.

Sign language recognition (SLR) is a multidisciplinary research area involving pattern recognition, computer vision, natural language processing and linguistics [1]. Moreover for Human Computer Interaction (HCI), as compared to the traditional interaction approaches such as keyboard, mouse, pen etc, and vision based hand interaction is more natural and efficient

Hand gestures fall into two categories, namely static and dynamic [3]. Some hand gestures also have both static and dynamic elements, as in sign languages [8]. Static hand gestures are characterized by the hand posture which are determined by a particular finger thumb-palm configuration and represented by a single image. Dynamic hand gestures are on the other hand characterized by the initial and final stroke motion of a moving gesture.

RELATED WORK

Research showed various types of system and methods have been developed for sign language recognition. Acquired data in recognition system can be obtained either “Data-Glove based” or “Vision Based” approaches. The Data-Glove based methods use sensor devices for digitizing hand and finger motions into multi-parametric data [14]. These approaches can easily provide exact coordinates of palm and finger’s location and orientation, and hand configurations However, the devices are quite expensive and bring much cumbersome experience to the users. In contrast, the Vision based methods require only a camera, thus considered easy, natural and less costly compared to glove based approach [7].

A vision based system was presented by Hienz et al [5] in which extracted feature vectors from video frame, recognize 262 different signs with an accuracy of 94%. Rule based classification was done on the images captured by single video camera by use of modular frame grabber system. Yin et.al [17] employed Restricted Coulomb Energy (RCE) neural network by taking L*a*b color space as input and trained the output layer as skin layer class. Ranganath et.al [13] presented a hand gesture recognition system, they used image furrier descriptor as their prime feature and classified with the help of RBF. Ong et.al [12] detected hands with 99.8 percent accuracy in grey scale images with shape information alone; using a boosted cascade of classifiers Viola et.al [15] .Signers were constrained to wear longsleeved dark clothing, in front of mostly dark backgrounds. Hemayed et.al [4] developed a edge based recognizer for Arabic sign language which uses Prewitt edge detector to extract the edges of the segmented hand gesture. Accuracy of 97% was achieved using Euclidean distance for classification. Murthy et.al [11] trained a supervised feed-forward neural network to count fingers and find direction in which user point. The vision based recognition system classified hand gestures into ten categories employing back propagation algorithm with an accuracy of 89% on a typical test set.

The paper presents a simple yet an efficient recognition system which will convert the static sign gestures of American Sign Language into text. Geometrical properties of the hand are transformed into features. Neural network is used for recognition and classification task. The rest of the paper will discuss different phases for the development of the system: image acquisition, image processing, feature extraction and classification in detail.

PROPOSED METHODOLOGY

The system is designed on the principle of pattern recognition. Pattern recognition is a process that takes raw data and makes an action based on the category of the pattern. Pattern recognition can be used for classification in which assign each input value is assigned to one of a given set of classes. The flow diagram of the system is shown in Fig. 1.

A. Image Acquisition

In the first phase an image is taken from the webcam or from database. The system read input image from the database [6] which contain RGB images for ASL signs. The database contains samples of four signs performed by different users wearing long-sleeved clothing taken. The images are with uniform background of dark and light color under different lightening conditions. The image database consists of total of 160 images.120 images: 40 images for each sign are used for training purpose while remaining 40 images: 10 images for each sign are used for training.

B. Image Processing

Processing is performed using three steps: color space conversion for skin area extraction, morphological operations to remove noise and image cropping for ease of feature extraction.

1) Skin region detection: The system used L*a*b color space for skin region detection using thresholding technique. L*a*b is Color space defined by the CIE (the International Commission on Illumination), based on one channel for Luminance (lightness) (L) and two color channels (a and b). Input RGB image is firstly converted to L*a*b* Color space to separate intensity information into a single plane of the image, and then calculates the local range in each layer. Skin color classification work well when chrominance component used for segmentation, therefore luminance component is discarded. Using thresholding values, second and third layer which represent the chroma component are converted into binary image. Two binary images are then multiplied to get resultant binary image which contain only hand region. Various morphological operations like opening, dilation were performed to remove the noise from segmented hand region.

2) Image cropping: Using connected component analysis the connected regions of the resultant image are labelled. Each connected component associates a bounding box with it which provides dimensions for the rectangular box for cropping hand region from input image.

C. Feature Extraction

The features are extracted on the shape based properties of the hand. Tanibata et.al [16] developed Japanese Sign Language using hand features like orientation, area, the flatness of the hand region. Mohammed et.al [9] proposed method of hand feature extraction using several geometrical dimensions like height, width, area etc. Using regionprops following image features are extracted and feature vector is formed which will act as input stage for the recognition and classification of the sign gesture.

1. Area: Calculated as the total number of white pixels (i.e., binary value ‘1’).

2. Centroid = [Round (Σ(x-values represent white pixels)/area), Round (Σ(y-values represent white pixels)/area)].

Where N is the total number of black pixels (0’s) of the image and M is the total number of columns containing at least one black pixel (0).

Finally, the features collected from the above sections are combined to form a feature vector in the following order:

Feature vector, V= [area, x-centroid, y-centroid, centroid-distance, Average height]

Feature vectors of training images are stored in mat files of MATLAB and feature vector of input hand gesture image i.e. test image are calculated at run time.

D. Classification of Sign Gestures using Neural Network

The system uses feed-forward back propagation network for classification of sign gestures. Back propagation training algorithm is a supervised learning algorithm for multilayer feed forward neural network. Since it is a supervised learning algorithm, both input and target output vectors are provided for training the network. If there is an error, the Perceptron network will re-adjust the weights value until there is no error or minimized and then it will stop. Each pass through the input vectors is called epoch.

Input vector is the 1x5 feature vector so only five input neurons are used. The target vector is also defined corresponding to each hand gesture. The performance of the training is evaluated with MSE, correlation coefficient, i.e. regression (R) between the network outputs and corresponding target outputs and the characteristics of the training, validation, and testing errors. For the successful training, some of the conditions are set. These are the MSE set to 0.001, maximum validation failure set to 6 times, learning rate set to 0.05 and the maximum number of epochs set to 1000.Sim function is used to simulate the model. Finally output from the neural network is converted into text corresponding to each classified hand gesture. The training is stopped after 21 epochs since the validation error increased for more than six times as shown in Fig 2. The training, validation, and testing errors were in fairly good conditions with the characteristics set during training.

EXPERIMENTAL RESULTS

For the implementation of the proposed system, image database is created for training and testing images. The image database consists of four static sign gestures of ASL in ‘.jpg format’.

The method is implemented using MATLAB R2012a. Skin region detection, image cropping resizing and feature extraction is performed using Image Processing Toolbox. Neural network toolbox is employed for classification of hand gestures. The MATLAB built in function (sim) simulates network. The behaviour of (sim) takes the network input, network object, then returning the network output. Trained neural network is tested with Test image database.

A. Confusion Matrix

Confusion matrix has been plotted to show the recognition accuracy for each hand gesture as shown in Fig.3. The green box show number of images correctly classified and blue box show overall recognition rate for test images dataset for particular hand gesture.

Recognition Rate= (No of Recognized Letters/No of total samples of that Letters)*100%.

For testing the images GUI (Graphical User Interface) has been created. The GUI consists of nine push buttons. GUI provides an easy way to the user, to interact with the system. The GUI shows different phases of the system: input image, hand region detection, feature extraction in which centroid of the hand is shown and finally the recognized character using neural network. An example of GUI is shown in Fig. 4 that shows different phases of system involved for sign ‘V’

CONCLUSION

The system developed presents a simple yet an efficient method of gesture recognition using geometrical features based on the shape based properties of hand. Static hand gestures are recognized using neural network and converted into corresponding text each. The recognition rate of 85% is achieved for testing image. In future the system can be extended to recognize dynamic hand gestures in an unrestricted environment for real life applications.

Figures at a glance


Figure 1	Figure 2	Figure 3	Figure 4

References

Aran, O. (2008), ‘Vision based sign language recognition: modeling and recognizing isolated signs with manual and non-manual components’ (Doctoral dissertation, Bogaziçi University).

Bowden12, R., Zisserman, A., Kadir, T., and Brady, M, (2003). ‘Vision based interpretation of natural sign languages’.

Cutler. R, and Turk, M.(1998), ‘View based interpretation of real time optical flow for gesture recognition’. IEEE International Conference on Automatic Face and Gesture Recognition.

Hemayed, E. E., and Hassanien, A. S. (2010, December), ‘Edge-based recognizer for Arabic sign language alphabet (ArS2V-Arabic sign to voice)’. In Computer Engineering Conference (ICENCO), 2010 International (pp. 121-127) IEEE.

Hienz, H., Grobel,K., and Offner, G.(1996) ‘Real-Time hand-Arm Motion Analysis using a single video camera’, proc. International conference on Automatic Face and Gesture recognition ,pp. 323-327.

https://sites.google.com/site/autosignlan/source/image-data-set

Khan, R. Z., & Ibraheem, N. A. (2012), ‘COMPARATIVE STUDY OF HAND GESTURE RECOGNITION SYSTEM’. SIPM, FCST, ITCA, WSE, ACSIT, CS & IT, 6, 203-213.

Mitra, S., and Acharya, T. (2007),‘Gesture recognition: A survey. Systems, Man, and Cybernetics, Part C: Applications and Reviews’, IEEE Transactions on, 37(3), 311-324.

Mohammed,F.,Mohammad, W,Kayes, and A.S.M,Poya, A.(2013),’Geometric feature Extraction of Human Hand’, International Journal of Computer and Information Technology (SSN:2279-0764),Volume 02-Issue 04

Md. Aktaruzzaman, Md. Farukuzzaman Khan, and Professor Dr M. Ekin Uddin.( 2009), ‘Recognition of Offline Cursive Begali Handwritten Numerals using ANN’ Journal of the Peoples University of Bangladesh,vol. 4, N0.1, July 2009, pp18-28, Bangladesh, ISSN 1812-4747

Murthy, G. R. S., and Jadon, R. S. (2010, February),‘Hand gesture recognition using neural networks’. In Advance Computing Conference (IACC), 2010 IEEE 2nd International (pp. 134-138). IEEE

Ong, E-J, Bowden,R.,(2004) ,’A Boosted Classifier Tree for Hand Shape Detection’, Proc. Int’l Conf. Automatic Face and Gesture Recognition, pp. 889-894.

Ranganath, S.,Ng,C W(2002), ‘Real-time gesture recognition system and application’, Image Vision Computer, 20(13-14): 993-1007.

Sturman, D. J., & Zeltzer, D. (1994), ‘A survey of glove-based input’. Computer Graphics and Applications, IEEE, 14(1), 30-39.

Viola, P, and Jones, M,(2002) ,’Robust Real-Time Object Detection’, Proc. IEEE Workshop Statistical and Computational Theories of Vision.

Tanibata,N.,Shimada,N., and Shirai, Y.(2002),’Extraction of Hand Features for recognition of Sign Language Words’, Proc. International Conference on Vision Interface.

Yin,X. and Xie, M.(2001), ‘Hand Gesture Segmentation, Recognition and Application’ Proc. Of 2001 IEEE International Symposium on Computational Intelligence in Robotics and Animation.