Distributed Testing Environment with
Similarity Based Clustering By Using Pds

R.Kanimozhi; Prof.J.RBalakrishnan

Distributed Testing Environment with Similarity Based Clustering By Using Pds

R.Kanimozhi, Prof.J.RBalakrishnan

PG student, Dept of computer science & engineering, Anand Institute of Higher Technology, Kazhipattur, chennai, India
Director & professor, Dept of computer science & engineering Anand Institute of Higher Technology, Kazhipattur, chennai, India

Corresponding Author: SHARMA VIVEK, E-mail: vivek03sharma@rediffmail.com

Related article at Pubmed, Scholar Google

Visit for more related articles at International Journal of Innovative Research in Science, Engineering and Technology

Abstract

prioritizing the test cases helps to increase the rate of fault detection. The difficulty of ensuring the dependability in the growth of the scale of software and software testing in distributed environment a sufficient software testing technique is not possible. It is often difficult to test a parallel and distributed system in the real world deployment. Hence in this paper, after generating test cases using functional requirements, dependency structure prioritization technique is used to prioritize the test cases based on the functional structure of dependency. A new technique namely cosine similarity based clustering approach is used to group the test cases based on the similarity values to form clusters. Each cluster is distributed in the distributed environment for parallel execution in order to reduce the computation time and to improve the rate of fault detection.

Index Terms

Cosine similarity approach, dependency structure prioritization, software testing.

INTRODUCTION

Software engineering is the systematic design and development of software products and maintenance of software. Software testing plays a major role in software engineering. Software testing is the process of validating and verifying the product to meet the requirements stated during the design and development. Software bugs will almost always exist in any software module with moderate size: not because programmers are careless or irresponsible, because the complexity of software is generally intractable and humans have only limited ability to manage complexity. It is also true that for any complex systems, design defects can never be completely ruled out.

Regression testing is process of retesting the modified software and ensures that new error does not introduce into the previously tested source code due to these modifications. regression testing is very expensive testing process in order to decrease the cost of regression testing the software tester may prioritize the test case so that the test case which are more important are run earlier during regression testing process. In this context, prioritization techniques can take advantage of information collected about the previous execution of test cases to obtain test case orderings. For regression testing the clustering technique is used for test case prioritization. in this the test case having common properties and similar fault detection ability are group together within same group. Test case prioritization improves the cost effectiveness of regression testing. The technique is developed in order to run test cases of higher priority in order to minimize time, cost and effort during software testing process.

Software testing involves test suites and test cases for finding fault where test suite is a collection of test cases that are grouped for test execution purposes. In order to generate test cases, find the sequence of interaction between the software testing scenarios for the software modules. Test case prioritization technique involves scheduling the test cases in an order that improves the performance so that the test cases with higher priority executed first which allow increasing the rate of fault detection.

Dependency structure prioritization technique is based on the functional dependencies between the test cases. Functional dependencies are the interactions and relationships between the system functionality which determines the run sequence. A Dependency Structure prioritization technique reduces the cost of testing and fault detection capabilities.

This prioritization technique classified into two types namely open and closed dependency structure. This technique is based on four algorithms namely DSP_volume, DSP_height, Weighted_DFS and Weighted_DFS_Visit.

Clustering is the method of analyzing and organizing data with similar characteristics are grouped together. Clustering is used in various fields including Data Mining, software engineering and Machine Learning. Much research has been done in the past to efficiently cluster data. Various algorithms and methods have been proposed for clustering. Hierarchical clustering, partitional clustering, nearest neighbor clustering, fuzzy clustering and defect clustering are some popular techniques used for clustering.

Defect Clustering in Software testing means that the majority of the defects are caused by a small number of modules, i.e. the distribution of defects are not across the application but rather centralized in limited sections of the application. A new technique namely cosine similarity based clustering is applied to cluster the test cases that are prioritized by using dependency structure. Cosine similarity is a vector based similarity measure between two vectors derived by using the Euclidean dot product formula.

Each clusters executed in distributed environment for parallel execution of test case clusters to measure the average fault rate. The results in the distributed environment are integrated to produce a single detection rate.

LITERATURE SURVEY

1. A clustering approach to improving test case prioritization: an industrial case study – Ryan Carlson (2011) says that they have limited their attention to a financial subsystem of Dynamics Ax product since it is a case study. It affects the choice of the number of clusters and order of clusters because they are based on the number of feature areas in Dynamics; different cluster order will affect the results. The choice of use of lines of code at the class level was based on the availability of information extracted from project repository this will affect their results.

2. Clustering based on cosine similarity measure – satyasree (2012) in this paper, cosine similarity is applied in data mining concept. Selecting different dimensional space and frequency levels leads to different accuracy rate in the clustering results although it is proved more accurate than traditional methods but still accuracy must be measured.

3. Using dependency structure for prioritization of functional test suites- Shifa-e-Zehra Haidry and Tim Miller (2013) in this paper, they proposed a new test case prioritization technique that uses the dependency information from the test suites to prioritize. Dependency structure prioritization technique includes four algorithms for prioritizing. The open dependency proves to have lower execution cost and closed dependency achieved better fault rate detection than the traditional methods. Average rate of fault detection is used to calculate the percentage of fault rate but clustering approach is not considered to improve the fault rate further.

4. Enhanced distributed document clustering algorithm using different similarity measures - Neethi Narayanan, J.E.Judith, Dr.J.Jayakumari (2013) in this paper, a distributed environment is considered in which all peers form a ring structure and the information are stored in DHT. A local model is formed using EDK-means using similarity algorithm. All local models aggregated to form a global model using EPCP2P this improves clustering quality and accuracy. Even though Jaccard and Pearson coefficients show better results than cosine similarity in data mining but this is not possible in case of software testing.

5. Comparison of Jaccard, Dice, Cosine similarity coefficient to find best fitness value for web retrieved document using genetic algorithm - Vikas Thada, Dr Vivek Jaglan (2013) in this paper, they made a comparative analysis for finding out the most relevant document for the given set of keyword by using three similarity coefficients namely cosine, dice and jaccard this was performed by using genetic algorithm approach. They selected first 10 paged out of the google search result for their experiment based on that the best fitness value were obtained using the cosine similarity coefficients than the dice and jaccard.

6. Prioritize code for testing to improve code coverage of complex software - J.Jenny Li (2005) Code prioritization for testing promises to achieve the maximum testing coverage with the least cost. This is an innovative method to provide hint on which part of code should be tested first. It helps to provide better code average. It includes two parts: extending the traditional dominator analysis method to include global impact of function/method calls, and relaxation of the ÃÂ¢Ãâ¬Ãâ¢guaranteedÃÂ¢Ãâ¬Ãâ rule to ÃÂ¢Ãâ¬Ãâ¢at leastÃÂ¢Ãâ¬Ãâ to make the dominator analysis simpler. They had implemented both the original dominator analysis method and this relaxed method with global view, for calculating code priority for testing. The two calculations had been applied to 4 actual industrial products. The experimental results show that the new calculation is consistently better than the original ones in identification of code to improve code coverage.

7. Combinatorial Interaction Regression Testing: A Study of Test Case Generation and Prioritization. - Xiao Qu, Myra B. Cohen, Katherine M. Woolf (2007) They have conducted an empirical study on two software subjects, each with multiple successive versions. The first method uses branch coverage from the prior version. The second method is to use the interaction weighting method, but rather than regenerate they simply use it to order the given tests based on their weights. They first examined the effectiveness of CIT test suites compared with an exhaustive strategy. They applied prioritization and regen/ prio and compared their effectiveness on CIT test suites. They examined several different ways to control the prioritization. They used methods that leverage code coverage from prior releases, as well as one that is specification based. There results shows that the CIT test suites may be an effective way to reduce the test space and that prioritization improves the ability to detect faults early in certain subjects.

8. Search Algorithms for Regression Test Case Prioritization.- Zheng Li, Mark Harman, and Robert M. Hierons(2007) This paper focuses on test case prioritization techniques for code coverage, including block coverage, decision (branch) coverage, and statement coverage. This paper presents results from an empirical study that compared the performance of the five search algorithms applied to six programs, ranging from 374 to 11,148 lines of code. In this paper addresses the problems of choice of fitness metric, characterization of landscape modality, and determination of the most suitable search technique to apply. five search techniques are studied: two Meta heuristic search techniques (Hill Climbing and Genetic Algorithms), together with three greedy algorithms (Greedy, Additional Greedy, and 2- Optimal Greedy). The results of the empirical study show that the Additional Greedy, 2-Optimal, and Genetic Algorithms always outperform the Greedy Algorithm.

9. Test Case Prioritization: A Family of Empirical Studies. - Sebastian Elbaum, Alexey G. Malishevsky, Gregg Rothermel(2002) General prioritization that tries to pick out a action at law order that may be effective on the average over a succession of sequent versions of the computer code. In regression testing, they have a tendency to square measure involved with a specific version of the computer code and that they might need to place take a look at cases in an exceedingly manner that may be handiest for that version. A coarser granularity—for example, at the operate level, wherever instrumentation and analysis square measure a lot of economical. They expect, however, that coarse roughness techniques are less effective than fine roughness techniques and loss of effectiveness might offset potency gains. Revealed a large performance gap between the results achieved by the prioritization techniques but they have a tendency to examine and also the best results accomplishable.

10. Adaptive Random Test Case Prioritization.- Bo Jiang, Zhenyu Zhang, W. K. Chan, T. H. Tse(2009) They proposed a set of ART prioritization techniques guided by white-box coverage information. They also conducted an empirical study to evaluate their effectiveness. Rather than integrating with techniques with the class of greedy algorithms, they choose to study them in a standalone fashion so the observations drawn from the study will be independent of the latter techniques. The main contribution of this paper is twofold: (i) it proposes the first set of coverage-based ART techniques for test case prioritization. (ii) It reports the first empirical study on ART-based prioritization techniques. The generate procedure constructs a set of not-yet-selected test cases iteratively, by randomly adding remaining test cases into the candidate set as long as they can increase program coverage and the candidate set is not yet full. The empirical results show that their techniques are significantly more effective than random ordering. Moreover, the ARTbr- max min prioritization technique is a good candidate for practical use because it can be as efficient and statistically as effective as traditional coverage-based prioritization techniques in revealing failures. Both genetic algorithms and their ART techniques are effective in avoiding local maximums commonly faced by greedy algorithms.

CONCLUSION OF LITERATURE SURVEY

The above studies include various test case prioritization techniques which prioritize the test cases but they are not implemented in distributed environment. The ordering of test cases in the existing system vary in many ways but perfect ordering is not available. The rate of fault detection is less improved in the previous techniques but still higher rate of fault detection is required.

MOTIVATION OF RESEARCH

The motivation of our research is to provide an efficient test case prioritization technique in distributed environment.

ÃÂ¯ÃâÃÂ· How to increase the rate of fault detection?

ÃÂ¯ÃâÃÂ· How to implement test case prioritization technique in distributed environment?

ÃÂ¯ÃâÃÂ· How to cluster the test cases based on the prioritization?

ÃÂ¯ÃâÃÂ· How to avoid executing redundant test cases?

ÃÂ¯ÃâÃÂ· How to reduce the rate of execution time?

PROPOSED WORK

A. Overall System Design

In our proposed system a simple login application is taken for testing. Based on the content in the application we need to write test cases. Test cases are written based on the sequences of interactions between the various fields involved in the application. Each test cases generated will have a unique ID. In order to predict the dependencies between the test cases we use an algorithm namely Dependency Structure Prioritization which consist of four categories namely volume, height, weight and the visit.

Based on this four dependency calculation, the prioritization of test cases we need to calculate by the DSP. By using the prioritized test cases we need to calculate the cosine similarity i.e. the similarity between the prioritized test cases. It is the vector array matrix value of 0 & 1. Now the prioritization is based on the two concepts of DSP and cosine similarity hence the prioritization of test cases will be higher and effective. Cluster the test cases based on the prioritized values, cluster formation is used for grouping the similar test cases with high priority each clusters are executed independently on the distributed environment. Execution in distributed environment will be parallel in order to reduce the execution time. Execute and evaluate the test cases to provide the average rate of fault detection i.e. the fault rate.

COSINE SIMILARITY ALGORITHM

Cosine similarity is similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0° is 1, and it is less than 1 for any other angle. The cosine value is always between –1 and 1: the cosine of a small angle is near 1, and the cosine of a large angle near 180 degrees is close to –1. The cosine similarity between two vectors (or two documents on the Vector Space) is a measure that calculates the cosine of the angle between them. This metric is a measurement of orientation and not magnitude. Two vectors with the same orientation have a Cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude. Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0, 1].

INPUT: an n*n Boolean adjacency matrix representing test case list.

OUPTUT: an n*n Boolean adjacency matrix representing the similarity matrix.

Begin.

Step 1: collect the unique test case identity.

Step 2: collect the properties of the test cases.

Step 3: generate vector array.

Step 4: compute similarity matrix.

End.

1) Pseudocode for cosine similarity algorithm:

Input: IP_List, PropertiesList

Output : Similarity matrix

N= IP_List.size();

For I = 1 ; I = 1,2,3…N do

Ip1 = IP_List.get(I);

Properties pros = Ip1.getProperties();

Vector V1 ;

For J = 1 ; j < PropertiesList.size(); J++ do

If(Properties.contains(PropertiesList.get(J)) then

V1 [i][j] = 1

Else

V1[i][j] = 0 ;

End If

End For

For I = 1; I = 1, 2, 3…N-1 do

For J= 1; J = 1, 2, 3…N-1 do

Similarity[I][J]=(v1[I]*v1[J])/((mod(V1[I]))*

(mod(V2[J])))

End For

Return Similarity;

IMPLEMENTATION

This proposed system consists of five modules they are listed:

ÃÂ¯ÃâÃÂ· Test case generation.

ÃÂ¯ÃâÃÂ· Dependency structure prediction.

ÃÂ¯ÃâÃÂ· Prioritization of test cases.

ÃÂ¯ÃâÃÂ· Cluster formation.

ÃÂ¯ÃâÃÂ· Test case execution and evaluation.

A. Test Case Generation

In this module, first generate the test case for the module testing of an application. This module is used to find the sequence of interaction between the software testing scenarios. The sequence of interaction is used to predict the test case levels based on the various fields provided in the application. This helps to find the order of test cases. Generate the test cases for the each software modules. Add the number of test cases being generated to the test case list for further testing process. Some testers may add the new test cases after it has already been generated. Those test cases will be listed below in order to modify the existing test case list; any number of test cases can be included. Add the number of test cases to the test case list and assign ID for each test case. Additional test cases are added using the remote server to generate distinct ID.

B. Dependency structure prediction

Predict the Open and closed dependent structure of the test cases independently. The sequencing in scenarios represents the order in which the interactions take place, enforcing dependencies between interactions. Some interactions cannot occur until and unless some other interactions occur first. Based on the prediction, find the relationship between the test cases in order to prioritize the test cases later. The relationship predictions are based on the fields involved in the application. This relationship must be related to the previous ordering of test cases. Compute the weight and height of the test case structure for prediction of test cases. Predict the functional dependency between the test cases to form a dependency prediction matrix. It is an n*n matrix which contains n number of test cases, the matrix value will be the cross number of dependencies between those two test cases. Compute the height and weight of the test cases which is used to form a DSP volume and DSP height matrix.

C. Prioritization of Test Cases

Dependency structure prioritization technique is used to prioritize the test case by using four algorithm namely 1) DSP volume measures gives higher weight to test cases that have more dependents. 2) DSP height measures gives higher weight to those test cases that have deepest dependents.3) weighted_DFS is used for test case prioritization based on dependency strucutres. 4) weighted_DFS_visit is used for finding the visiting vertices in the dependency structure. Compute the similarity between the test cases by using the cosine similarity measurement. The prioritization is set based on the prediction done earlier; higher priority is set to those which have low height and high weight value. Vector matrix is formed based on the priority values to find similarity between the test cases. Cosine similarity matrix includes the vector matrix values to form the clusters based on similarity.

D. Cluster Formation

The similarity based clustering is used for grouping the test cases in order to make testing effective. A distributed environment is generated to test the test cases. Distribute the test case groups in the distributed environment. Perform the testing process and generate the test results. After collecting the test results apply the integration method to reduce the results from distribute environment. This proposed technique helps to achieve less computing time for testing and also reduce the failure ratio for testing software.

Cosine similarity approach is used to form the clusters. The clusters are used to implement in the distributed system for providing better performance in fault detection. The number of clusters must be less than the number of distributed environment in order to form the group of the clusters. This implementation reduced the execution time.

E. Test case Execution and Evaluation

Send the clustered test cases to the distributed environment for execution. Execute the test cases and evaluate the results to find the percentage of fault detection by average rate of fault detection. Compare the values with the existing system to prove that the proposed system has higher accuracy in fault detection.

EXPERIMENTAL RESULTS

The graph shown below is used to provide the expected experimental results of the fault rate.

It is used to provide the comparison between the existing systems with the proposed system. The expected accuracy of the system will be around 85- 90%.

CONCLUSION AND FUTURE WORK

Thus the proposed technique for prioritization of test suites that contain dependencies between test cases has been defined. Dependency structure prediction and prioritization provides effective prioritization of test cases. This cosine similarity based clustering approach helps to increase the rate of fault detection in distributed environment. Through this we will achieve the Optimized time and less error rate for the test case execution in the distributed environment.

In future, this cosine similarity approach can be implemented in the cloud environment like hadoop to optimize the execution time for software testing in the distributed environment.

References

Ryan Carlson, Hyunsook Do, Anne Denton (2011) =aClustering Approach to Improving Test Case prioritization: AnIndustrial Case Study,‘ 27th IEEE International Conference onSoftware Maintenance.

K.P.N.V.Satyasree ,Dr.J V R Murthy(2012) = clustering basedon cosine similarity measure,‘ international journal of engineeringscience and advanced technology.

Shifa-e-ZehraHaidry and Tim Miller (2013) ‘UsingDependency Structures for Prioritization of Functional Test suites,‘IEEE transaction on software engineering.

Neethi Narayanan, J.E.Judith, Dr.J.Jayakumari (2013)=Enhanced Distributed Document Clustering Algorithm UsingDifferent Similarity Measures,‘ IEEE Conference on Informationand Communication Technologies.

VikasThada, Dr VivekJaglan (2013)=Comparison of Jaccard,Dice, Cosine Similarity Coefficient To Find Best Fitness Value forWeb Retrieved Documents Using Genetic Algorithm,‘International Journal of Innovations in Engineering andTechnology.

J.Jenny Li (2005) ‘Prioritize Code for Testing to Improve CodeCoverage of Complex Software,‘ IEEE transaction on softwarereliability engineering.

Xiao Qu, Myra B. Cohen, Katherine M. Woolf (2007)=Combinatorial Interaction Regression Testing: A Study of TestCase Generation and Prioritization,‘ IEEE international conferenceon software maintenance.

Zheng Li, Mark Harman, and Robert M. Hierons(2007) =SearchAlgorithms for Regression Test Case prioritization,‘ IEEEtransactions on software engineering.

Sebastian Elbaum, Alexey G. Malishevsky, GreggRothermel(2002) =Test Case Prioritization:A Family of EmpiricalStudies,‘ IEEE transactions on software engineering.

Bo Jiang, Zhenyu Zhang, W. K. Chan, T. H. Tse(2009)=Adaptive Random Test Case Prioritization,‘ IEEE internationalconference on automated software engineering.

X. Qu, M.B. Cohen, and K.M. Woolf (2007), ‘CombinatorialInteraction Regression Testing: A Study of Test Case Generationand Prioritization‘, Proceeding. IEEE Int‘l Conf. SoftwareMaintenance, pp. 255-264.

Z. Ma and J. Zhao (2008), =Test Case Prioritization Based onAnalysis of Program Structure‘, Proceeding 15th Asia-PacificSoftware Eng. Conf, pp. 471-478.