ISSN ONLINE(2320-9801) PRINT (2320-9798)

Efficiency comparison of selected endoscopic video analysis algorithms

Jan Cychnerski
Department of Computer Architecture, Gda´nsk University of Technology, Poland
Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Computer and Communication Engineering

Abstract

In the paper, selected image analysis algorithms were examined and compared in the task of identifying informative frames, blurry frames, colorectal cancer and healthy tissue on endoscopic videos. In order to standardize the tests, the algorithms were modified by removing from them parts responsible for the classification, and replacing them with Support Vector Machines and Artificial Neural Networks. The tests were performed in an unified manner on a common, large movie database of real endoscopy videos. The test results often do not seem to confirm the high efficiency declared by their authors. A maximum of 80% sensitivity and specificity was achieved, while the authors often declared as much as 90%.

Keywords

endoscopy, video analysis, algorithms, comparison, efficiency

INTRODUCTION

Since last several years, endoscopic movie analysis algorithms (for gastroscopy, colonoscopy and wireless capsule endoscopy, WCE) gained much popularity. These algorithms were designed for recognizing informative and noninformative frames, and various diseases or healthy tissues. Algorithms found in the literature are claimed by their authors to give high performance (in terms of accuracy, sensitivity, specificity etc.) results [1]. However, the publications’ flaw is often the lack of comparative tests of different algorithms (or lack of any comparison at all). One of the reasons of such situation is the lack of a good public database of medical gastrointestinal endoscopy images prepared for algorithm testing purposes.
This article focuses on a comparison of selected endoscopic image analysis algorithms. To allow the comparative analysis, it was necessary to establish common conditions for algorithms’ operation. Algorithms were modified so to unify their operation, and then, the comparative tests were carried out on identical sets of data, measuring algorithms’ performance in detecting informative and non-informative frames, colorectal cancer and normal tissue.

ALGORITHMS

In the article, selected image analysis algorithms were tested and compared, as in table I.

TEST PROCEDURE

Algorithms were compared in two main tasks: efficiency in distinguishing (a) cancer from normal tissue of the large intestine, and (b) informative / non-informative (e.g. distorted by the movement of the endoscope, poor lighting, liquid covering the camera of the endoscope, etc.).
For this purpose, all tests were performed on a common database of real endoscopic videos of the colon [2]. To unify algorithms’ operation, their parts responsible for classification were removed, leaving only the core – feature vector extraction. All feature vectors were also normalized so that every feature had the mean of 0 and standard deviation 1 over the whole database. For classification, Artificial Neural Networks (ANN) and Support Vector Machines (SVM) were used to test algorithms’ efficiency (all the classifiers were trained and tested the same way on the same data).
Classifier training was carried out on the database of [1] endoscopic endoscopic videos, fully labeled by an expert for the content of each frame. The expert gave every frame of every video one of three labels: [blurry], [sharp, cancer] or [sharp, healthy]. Due to the different length of the videos and different proportions of labels, from each film maximum of 30 frames (possibly far from each other) were selected for further processing for each label. Total number of selected frames was  4750 for blur recognition and  2750 for cancer recognition.
Two main types of tests were performed: (a) identify clear (informative) / blurry (non-informative) frames, and (b) identify healthy / cancerous tissue. For each test type, the input data was divided into eight sets, preserving the ratio of classes, and so that images of one patient were placed always in the same set (set assignment was performed with an algorithm described in [19]). Such set balancing is recommended in medical research [1].
Prepared sets were used to 8-fold cross-validation. For each classifier, a set of its parameters was selected, and then their optimization was performed by algorithm CRS [17] from NLopt library [18], with a time limitation of 8 hours (usually it resulted in 5000–50000 iterations of the algorithm). During the tests, following efficiency parameters were measured:
1) Sensitivity — performance at recognizing positive samples
2) Specificity — performance at recognizing negative samples
3) Accuracy — performance at giving correct answer
4) Smoothness — smoothness of the classifier’s output [20]
5) Overall score — weighted harmonic mean of sensitivity, specificity and smoothness values

RESULTS

This section contains the test results of all tested algorithms. The tests were performed in the same manner, on the same hardware, in the same conditions, and with the same data (as described in the previous section).
Tables II – III and figures 1 – 2 present the results of the recognition of blurry/clear (informative/non-informative) frames with the Artificial Neural Networks and Support Vector Machines. In this task, the neural network performed significantly better than SVM. The results are relatively consistent with expectations and with the descriptions of the authors of the original publications (if present). Test algorithms performed far worse than the others. In the task of blurry frames recognition, the best algorithms were: BaoupuLi, MB-LBP-C, Kodo, LCVP.
Tables IV – V and figures 3 – 4 present the results of recognition of colorectal cancer / normal tissue, with the ANN and SVM. In this task, the neural network performed also better than SVM, though not as clearly as in blur recognition.
Test algorithms also performed usually worse than most other algorithms. In the task of identifying cancerous tissue, the best algorithms were: BaoupuLi, MB-LBP-C, LCVP, DFT-HT, AHT. However, these results are far below declared by the authors of the original publications of efficacy (often over 90 %!) [1].

CONCLUSION

In the article, selected endoscopic image algorithms were tested and compared in the tasks of detection of blurry and clear (informative/non-informative) frames, colorectal cancer and healthy colon. Tests were performed on a large endoscopic video database, under the same conditions for all algorithms. The efficiency of recognizing diseases clearly differed from those declared by the authors. In the task of blur recognition, the algorithms performed similarly (or slightly better).
These results indicate the need for greater comparative tests across the field of the endoscopic image analysis. Such tests should be performed on a single shared database, in the same way. The previous approach of the authors in the field, consisting of only testing on their own (often small) data sets seems to be insufficient.

Tables at a glance

Table icon Table icon Table icon Table icon Table icon
Table 1 Table 2 Table 3 Table 4 Table 5
 

Figures at a glance

Figure 1 Figure 2 Figure 3 Figure 4
Figure 1 Figure 2 Figure 3 Figure 4
 

References