| Keywords | 
        
            | scene segmentation, image distance, image classification | 
        
            | INTRODUCTION | 
        
            | Medical imaging techniques see ongoing improvements, among others the development of new video data acquisition methods (e.g. Wireless Capsule Endoscopy - WCE [1]) and a constant acceleration of processing acquired data. In
 these fields there are many issues related to gathering and processing large amounts of video data. For example, a
 single WCE video can be up to 8h long and require 2-3h of time of a medical specialist to examine [2].
 | 
        
            | Depending on the particular application, it may be required to provide a general conclusions about the whole video or a detailed analysis of each separate frame. The former case is related to diagnostics, where the most important
 information is whether the patient has any internal bleedings or lesions, and any additional information about them.
 The latter case may be considered when creating a database for image recognition algorithms or educational purposes -
 when the final user (here: a person or algorithm) operates on various pictures, comparing cases of diseases with healthy
 tissues.
 | 
        
            | In real life videos, especially when a sufficiently high frame rate is assured, consecutive frames are very similar, and present just minor modifications to the adjacent frames. In medical applications that means that the recorded stream can
 be perceived as mostly “continuous” [3], with objects entering and leaving the scene gradually [4].
 | 
        
            | The continuous character of videos allows to consider a division into scenes [5] - shorter sequences of similar frames, sharing a given label. Methods for finding such divisions have been developed especially for live action films [6], [7],
 where scene and shot changes are instant and significantly easier to spot by the rapid change of frame characteristics.
 | 
        
            | In medical imaging a popular approach involves the detection of characteristic or specific frames to acquire a so called video summary [8], [9], [10], [11]. These frames are representatives of their direct neighborhoods, with implicitly
 assumed changes of scenes between the.
 | 
        
            | The pace of the change of the video is measured by metrics specifying the relative difference of consecutive frames. Such functions and their properties have been defined in section II.
 | 
        
            | In this paper we will present the results of evaluating a new scene segmentation algorithm incorporating three selected metrics with different parameter sets. The evaluation is performed in terms of the accuracy of the final classification of
 all scenes as a whole. The whole procedure has been further described in section III, whereas its results can be found
 in section IV.
 | 
        
            | PRELIMINARIES | 
        
            | A. Image metrics | 
        
            | To determine how much two frames differ and how much the view in the video changes, a function for comparing frames has to be defined. Such functions will further be called metrics. Our definition 1 of a metric d requires it to
 fulfill the following properties:
 | 
        
            |  (1) | 
        
            |  (2) | 
        
            |  (3) | 
        
            |  (4) | 
        
            | In the considered usage the value of the metric is expected to quantify the visual similarity of images in some way. | 
        
            | The definition above allows for any non-negative metric values. It is worth noting though, that the set of all possible images remains finite2, which implies the existence of an upper boundary M of the image distance. For an easier
 comparison and evaluation, the metric values are linearly normalized to the range [0; 1] (by dividing the value of a
 metric by upper bound M for the considered image size).
 | 
        
            | For our purposes we have chosen to consider the following metrics: | 
        
            | •   Simple distance (SD) - the l1 distance between the images (vectors of pixel values) | 
        
            | •  Simple distance on processed image (SP) - the Simple distance of two images after bluring and downscaling | 
        
            | •  Histogram distance with k bins (HD, HDk) - the l1 distance between k-bin HSV color histograms of the images. | 
        
            |  | 
        
            | 1Strictly speaking, the presented conditions define a pseudometric on the image space, since it is possible that d(a; b) = 0 for some a 6= b. The considered functions are proper distance functions only for the processed vectors acquired in a way defined for each metric.
 2The number of all possible pixel values for a given resolution and color depth is finite.
 | 
        
            | B. The algorithm | 
        
            | In this section we propose a simple algorithm for scene segmentation of (mostly) continuous videos. For each scene a specific frame which defines it is chosen. Next, the scene is created in two main steps:
 | 
        
            | Expansion - consecutive frames are assigned to the scene until a frame is reached whose difference from the specific frame exceeds a given threshold. That frame will be the next specific frame.
 | 
        
            | Reduction - frames from the end of the current scene, which are more similar to the next specific frame than to the current one are reassigned to the next scene.
 | 
        
            | The pseudocode of the algorithm has been presented in Algorithm 1 (exact with respect to treating some boundary cases).
 | 
        
            | EXPERIMENTS | 
        
            | Performed experiments involved the evaluation of the algorithm on a set of exemplary recordings from endoscopic examinations. Six representative films have been chosen, fulfilling the following criteria:
 | 
        
            | at least 1000 frames long | 
        
            | the recognized property is present in between 20% to 80% of the frames | 
        
            | there are at least 5 changes of the recognition status in the video | 
        
            | Those requirements have been set to prevent overrating algorithms just propagating a single result on all frames and evaluating algorithms on rare “chaotic” videos.
 | 
        
            | The scene segmentation algorithm has been applied to every film with five different metrics, using the thresholds listed in Table I. Different threshold sets are a result of different characteristics of the metrics - with SD and SP
 showing larger changes in value.
 | 
        
            | RESULTS | 
        
            | The first observation of the evaluation is the relatively high distance in SD and SP metrics for seemingly similar images. This observation is related to both these metrics being sensitive to even minor shifts and allowing to detect
 images with a very high amount of common static areas.
 | 
        
            | For various tested threshold values the average scene lengths have been computed and the results presented in Table II and Table III. As expected, a clear positive correlation between the threshold value and scene length can be seen for all
 metrics.
 | 
        
            | Figure 1 presents the relation of recognition accuracy and metric thresholds. A negative correlation can be seen between those two values. Results for the H8 metric, which got the best results of the Hk metrics, show that an accepted change of 25% of the possible distance (or: increasing the amount of differences six times) results in a change
 of less than 10% in accuracy.
 | 
        
            | Since different threshold values have been tested for the two groups of metrics, it is important to note, that the values between the groups are not comparable in this graph.
 | 
        
            | The graph in Figure 2 presents the relation between the acquired average scene sizes and the resulting classification accuracy. It can be seen that all of the Hk metrics acquired similar results and outperformed the SD and SP metrics.
 This shows that the scene segmentation algorithm with the Hk metrics acquires a better division into scenes and
 assignment of their specific frames.
 | 
        
            | High accuracy values of over 95% are preserved for scene sizes of up to six frames. With such results. costly recognition algorithms might be improved to operate on whole scenes, with a scene segmentation algorithm tuned in
 respect to given time limitations depending on an accepted performance/accuracy tradeoff.
 | 
        
            |  | 
        
            | SUMMARY | 
        
            | In this paper a new method for accelerating the classification of frames in video sequences has been shown. Preserving high reliability, the number of frames to be processed by a classifying algorithm can be decreased by a factor of over
 80%. Three general types of metrics have been evaluated, with shift-insensitive metrics based on HSV histograms
 performing simple image distances.
 | 
        
            | A broad range of possible metric definitions and parametrizations leaves an open space for further experiments in this subject.
 | 
        
            | Tables at a glance | 
        
            |  | 
        
            |  | 
        
            | Figures at a glance | 
        
            | 
                
                    
                        |  |  |  
                        | Figure 1 | Figure 2 |  | 
        
            |  | 
        
            | References | 
        
            | 
                AlexandrosKarargyris and NikolaosBourbakis. Wireless  Capsule Endoscopy and Endoscopic Imaging: A Survey on Various  MethodologiesPresented. IEEE Engineering in Medicine and  Biology Magazine, 29(1):72–83, 2010.
 Michael Liedlgruber and Andreas Uhl.  Computer-aided decision support systems for  endoscopy in the gastrointestinal tract: a review.IEEE reviews in biomedical engineering, 4:73–88, 2011.
 ZdzislawPawlak. On Some Issues Connected With Roughly  Continuous Functions. 1995.
 Hai Vu, TomioEchigo, Ryusuke Sagawa, Keiko Yagi,  MasatsuguShiba, Kazuhide Higuchi, Tetsuo Arakawa, and Yasushi Yagi.  Contractiondetection in small bowel from an image sequence of wireless capsule  endoscopy. Medical Image Computing and  Computer-AssistedIntervention, 10(Pt 1):775–783,  2007.
 John M Gauch, Susan Gauch, Sylvain Bouix, and Xiaolan Zhu.  Real time video scene detection and classi T« cation. Wall StreetJournal,  35:381–400, 1999.
 Tong Lin and Hong-jiang Zhang. Automatic video scene  extraction by shot grouping. Proceedings  15th International Conference onPattern Recognition. ICPR-2000, 4:39–42, 2000.
 UfukSakarya and ZiyaTelatar. Video scene detection using  graph-based representations. Signal  Processing: Image Communication,25(10):774–783,  November 2010.
 That Mon Htwe, CheeKhunPoh, Liyuan Li, Jiang Liu,  EngHuiOng, and Khek Yu Ho. Vision-based techniques for efficient  WirelessCapsule Endoscopy examination. In 2011  Defense Science Research Conference and Expo DSR,  pages 1–4. Department of ComputerVision and Image Understanding, Institute for  Infocomm Research, Singapore 138632, IEEE, 2011.
 SelenAtasoy, Diana Mateus, Joe Lallemand, Alexander  Meining, Guang-Zhong Yang, and Nassir Navab. Endoscopic video manifolds.Medical Image Computing and Computer-Assisted  Intervention, 13(Pt 2):437–445, 2010.
 Giovanni Gallo, ElianaGranata, and Alessandro Torrisi.  Information Theory Based WCE Video Summarization. Pattern RecognitionICPR 2010 20th International  Conference on, 0:4198–4201, 2010.
 Giovanni Gallo and Alessandro Torrisi. Boosted Wireless  Capsule Endoscopy Frames Classification. In PATTERNS  2011, The ThirdInternational Conferences on Pervasive Patterns and Applications, pages 25–30, 2011.
 |