Bezuayehu Gutema Asefa*
Department of Food Science and Nutrition Research, Ethiopian Institute of Agricultural Research, Sebeta, Ethiopia
Received: 26-Jul-2022, Manuscript No. JFPDT-22-70301; Editor assigned: 28-Jul-2022, PreQC No. JFPDT-22-70301 (PQ); Reviewed: 11-Aug-2022, QC No. JFPDT-22-70301; Revised: 05-Oct-2022, Manuscript No. JFPDT-22-70301 (R); Published: 12-Oct-2022, DOI: 10.4172/2321-6204.10.6.001
Visit for more related articles at Research & Reviews: Journal of Food and Dairy Technology
A rapid method based on digital image analysis and machine learning technique is proposed for the detection of milk adulteration with water. Several machine learning algorithms were compared, and SVM performed best with 89.48% of total accuracy and 95.10% precision. An increase in the classification performance was observed in extreme classes. Better quantitative determination of the added water was done using SVMR with R2 (CV) and R2 (P) of 0.65 and 0.71 respectively. The proposed technique can be used for the nondestructive determination of milk adulteration with water without the necessity of any additional reagent.
Milk adulteration; Multivariate classification; Support vector machine; Validation; Digital image analysis
Having high nutritive value, providing macronutrients (proteins, fat and minerals) and micronutrients (vitamins and trace elements), cow milk is among the recognized contributor to a balanced diet of many populations. Due to the high nutritional composition, a high rate of milk consumption with an increasing demand exists worldwide . Despite the role of milk in food and nutrition security, the increase in demand has amplified fraudulent activities, subsequently making milk the second most vulnerable product to adulteration .
Milk adulteration could be dilution with water with the intention to increase economic gain or addition of substances (e.g., Sucrose, sodium chloride, vegetable oil and surfactants) that improve the physicochemical and visual characteristics of milk . Besides, the addition of substances that extend the shelf life of milk, such as formaldehyde, hydrogen peroxide and hypochlorite is becoming a serious issue of adulteration in the dairy industry.
Assessments on the prevalence of milk adulteration in several countries found water as the most frequently added adulterants. Water is added to grow economic gain by increasing the volume of milk through dilution. However, the addition of water to milk dilutes the constituents in milk and could cause potential public health risk of acute malnutrition (stunting, wasting and underweight) which, leads to nutrition-related child mortality. According to experts, next to educating farmers about the consequences of milk fraud, the need for improved detection of is key to address the prevailing risk of fraud in milk. Several studies have shown the possibility of determining the presence of water as an adulterant in milk samples using different techniques. Newly developing techniques that are robust, green, simple and cost effective are gaining increasing importance in food quality monitoring .
Digital image based procedures that use the power of machine learning algorithms are increasingly used to assess adulteration in agro food products including milk. In recent years, several studies were conducted to develop digital image-based techniques for the determination of adulterants in milk. However, the newly developed techniques lack representative sampling during imaging of milk samples. For instance, indicator chemicals were used to bring the desired classification result before the imaging process. This brings limitations in the utilization of those techniques since users of such methods are required to have technical knowledge of the procedure.
Considering the limitation in the existing methods, this paper proposed a clean method based on digital image processing coupled with a machine learning algorithm to test milk adulteration with water. The proposed technique is fast, robust and doesn’t require sample preparation including the use of any chemicals .
Raw milk samples were obtained from two different dairy farms found in Sebeta and Debre Zeit agricultural research centers of the Ethiopian Institute of Agricultural Research (EIAR). Known research dairy farms were selected to ensure the purity of the milk before spiking the adulterant. A batch of milk was used to acquire images of pure milk and modified milk with water as an adulterant in a range from 10 to 40%. Since image acquisition was performed in sampling locations, all milk samples used in the study were neither refrigerated nor subjected to transportation longer than one km. The volume of milk sample for each image acquisition was kept constant at 25 ml, which was quantitatively transferred to a petri dish to acquire images from the top surface. Adulterated milk samples were simulated by spiking water in the whole sample used in one day to avoid differences in image intensities due to spiking individual samples.
A conventional image acquisition chamber having a dimension L x W x H of (40 x 40 x 60), made from aluminum sheet was used. Uniform lighting was maintained using twelve fluorescent lumps mounted to four sides of the imaging chamber at a height 40 cm above the bottom surface. A digital camera (EOS, 6D Mark II, Canon, Japan) installed with an image stabilizer of 24-105 mm was set at the top of the image acquisition chamber heading down to the petri dish containing milk sample at a height of around 55 cm. The process of image acquisition was fully monitored using EOS utility software. Fifty samples were prepared for each sample group from the two sampling sites. Image of each sample was captured in duplicate, making a total of two hundred images for each group of samples .
Images acquired from all samples were processed using a batch processor in Figure 1. All the captured images were treated with a global processing stage that takes the region of interest from the bulk image. The central area of each image was cropped with a pixel size of 250 x 250. Further processing such as converting to different color spaces (Lab* and HSI) and filtering were performed as summarized in Table 1. The mean and modal grey values, minimum and maximum grey values, standard deviation, median and center of mass were calculated for each processed image. After calculating processed image parameters, some values indicated in the ‘-’ sign in Table 1 are found irrelevant and were not included as a predictor variable due to the fact that similar output values were obtained for all sample groups. Totally, 125 variables were included as a predictor in the development of multivariate models.
|Image process description||Measurement parameters|
|Mean grey value||Standard deviation||Modal grey value||Median grey value||Minimum grey value||Maximum grey value||Center of mass (X maximum)||Center of mass (Y maximum)||Skewness||Kurtosis|
|Resizing (250 x 250 pixels)||+||+||+||+||+||+||+||+||+||+|
|Splitting RGB (R)||+||+||+||+||+||+||+||+||+||+|
|Splitting RGB (G)||+||+||+||+||+||+||+||+||+||+|
|Splitting RGB (B)||+||+||+||+||+||+||+||+||+||+|
|Convert to HSI (H)||-||-||-||-||-||-||-||-||-||-|
|Convert to HSI (S)||-||-||-||-||-||-||-||-||-||-|
|Convert to HSI (I)||+||+||+||+||+||+||+||+||+||+|
|Convert to Lab* (L)||+||+||+||+||+||+||+||+||+||+|
|Convert to Lab* (a*)||+||+||+||+||+||+||+||+||+||+|
|Convert to Lab* (b*)||+||+||+||+||+||+||+||+||+||+|
Image processing description: ‘+’ signs indicate parameters used as a variable, whereas ‘-’ signs refer to parameters excluded from the variable list.
Table 1. Description of the image processing and parameters included as a variable for the development of classification model.
Numerical values generated from the processed images were used to develop classification and regression models based on the level of added water into the pure milk. Multivariate procedures were Performed using MATLAB software (R2020b, PLS toolbox, eigenvector). Characteristics of the different multivariate procedures used in the current study are briefly described in Table 2.
|K-nearest neighbor (K-NN)||K-NN based classification works by identifying the distances between an unknown object and each of the objects of the training set mostly based on the euclidean distance. A decision is made based on the majority rule after the selection of the k-nearest objects to the unknown sample.|
|Soft Independent Modeling of Class Analogy (SIMCA)||SIMCA calculates the geometric distance from the principal component model and determines the class distance. In addition, the modeling and discriminatory powers are determined.|
|Support Vector Machine (SVM)||SVM based classification works by obtaining the ‘optimal’ boundary of two classes in a vector space independently on the probabilistic distributions of training vectors in the data set.|
|Partial Least Square Discriminant Analysis (PLS-DA)||PLS based classification works by finding the components in the input matrix (X) that describe the relevant variations at most in the input variables and have a maximal correlation with the target value in Y.|
Table 2. Summary of machine learning algorithms used for the classification task.
Model performance evaluation
The performance of each model was assessed using a total accuracy method which was computed using the True Positive (TP) and True Negative (TN) values obtained from the confusion matrix (Equation 1). Besides, the precision (Equation 2) recall (Equation 3) was calculated based on False Negative (FN) and False Positive (FP) values to support the classification model effectiveness.
A total of 25 predictor variables from 900 image data (i.e., 180 x 5 groups) were inspected visually from the excel file to identify potential outliers. Based on the observation, 29 image data were removed and the remaining 871 image data were used to develop the classification models. Before the analysis, Kenard stone technique was employed to randomly separate 80% of the data into the training set and the remaining 20% into a test set. The effect of variation in feature size was corrected by autoscaling the predictor variables [7-10].
Principal Component Analysis (PCA) was applied to reduce data dimensionality and new variables that are linear combinations of the original image feature values were generated. The selection of an optimal number of PCs was done based on the lowest prediction error in cross-validation (Venetian blinds). The first three PCs explained more than 75% of the data variance as shown in a 3-dimensional PCA score-plot obtained from three PCs (Figure 1). The change in color intensity can be observed from the score-plot. Increasing the amount of added water could be related to the diminishing color density of the images which is illustrated in reduced scores in PC 1. Since milk color is influenced by the composition, the addition of water to pure milk can affect the intensity. Detecting such minor differences in the intensity of milk color using the human eye could be difficult unless digital technologies are used with the support of numerical software.
The result table indicating the performance of each classification algorithm is given in Table 3. Of the four classification algorithms, SIMCA provided the worst performance with less than 60% total accuracy in a training dataset. Next to SIMCA, poor classification performance was obtained with the PLSDA algorithm. In contrast to the two classifiers, KNN and SVM achieved fair classification with total accuracy of 79.45 and 89.48 respectively. SVM generally achieved superior results compared to all the classifiers with 89.48% accuracy, 95.10% precision, and 83.24% recall values.
|Algorithm||Performance measures||Training set||Cross validation set||Testing set|
Table 3. Performance measures of different classification algorithms over the training, cross validation and prediction dataset.
Further analysis on the model’s prediction performance for each class of samples exhibited efficient classification performance of SVM algorithms in extreme classes (Table 4). This means milk samples with no adulteration and milk samples that have 40% added water were identified with better classification performance compared to other samples. Correct identification of pure milk sample was achieved using the same algorithm with an accuracy of 91.95% in prediction set samples. Also, SVM achieved the highest classification accuracy (92.04) in milk samples adulterated with 40% water.
|Algorithm||M: W||Training||Cross validation||Testing|
Table 4. Performance measures for class prediction of KNN, PLS-DA and SVM algorithms.
This result outperformed the previously developed procedure by Kobek, who found total classification accuracy of 81.66 using an Artificial Neural Network (ANN) based classification model. In another research by Poliana, et al. SIMCA and KNN classification algorithms were applied to distinguish milk adulterated with water from pure milk, and total accuracy of 82 and 92% respectively for SIMCA and KNN were found. However, indicator chemicals were used to bring the desired color change in both findings given these facts, our finding verified the possibility of using digital images to determine milk adulteration with water without the necessity of adding indicator chemicals .
Estimation of adulteration level
The dataset was also used to develop a prediction model for the level of adulteration [12-14]. The prediction performance of Partial Least Squares Regression (PLSR), Principal Component Regression (PCR), and SVMR algorithms was evaluated. The summary of quantitative prediction performance measures is presented in Table 5.
|No.||Method||Preprocess||LV/PC||RMSEC||RMSECV||R2 (Cal)||R2 (CV)||R2 (P)|
Table 5. Performance measures of regression models developed for quantitative adulterant prediction.
Except for the SVMR algorithm, inadequate prediction performance was found in predicting the water adulteration level with 0.16, 0.44 and 0.52 of prediction R2 in PCR, PLSR and MLR respectively. Interestingly, SVMR achieved better performance in predicting the amount of adulterated water in the milk samples with R2 (CV) and R2 (P) of 0.65 and 0.71 respectively .
The change in color of milk due to dilution by water has proved to be useful to detect adulteration through the use of processed images coupled with machine learning algorithms. SVM classification model discriminated milk samples based on the level of added water with accuracy and precision of 89.48% and 95.10%, respectively. The proposed technique can be used for the nondestructive determination of milk adulteration with water without the necessity of any additional reagent.
The authors wish to thank the Ethiopian institute of agricultural research for providing the necessary facilities for carrying out the study.