| Keywords | 
        
            | Data Quality, Intrinsic DQ, Representational DQ, Web Portal. | 
        
            | INTRODUCTION | 
        
            | Due to the advancement and growth of Information and Communications Technology, all the information of every       universal activity like News, Health, Entertainment, Education, etc., are available in websites through internet. The       World Wide Web is a repository of various data. But there is a question of quality of data published in the websites.       Data quality is a new research area that represents one of the biggest challenges for data mining. Data quality refers to       the accuracy and completeness of the data, also measured by the structure and consistency that is, how the data has       been represented in the web portal. A web portal or public portal is a web site that has lot of information from multiple       sources on the web. It organizes the information in an easy user-friendly manner. In worldwide numerous users use       web portals to obtain information for their work and to help with decision making. The users and data consumers need       to ensure that the data obtained are right for their needs. Thus the organizations that provide Web portals need to offer       data that meet user requirements. Data quality represents a common interest between data consumers and portal       providers. Data quality plays an important role in the efficiency and effectiveness of organizations and businesses. | 
        
            | CLASSIFICATION OF DATA QUALITY | 
        
            | Data Quality is classified into four categories, Intrinsic DQ, Accessibility DQ, Contextual DQ and Representational       DQ. Each category has many dimensions like Accuracy, Completeness, Consistency, Timeliness, etc. from literature       survey [2] in Table1. Accuracy of data is the degree to which data correctly reflects the real world object or an event       being described. An example of data Accuracy is the bank balance in the customer's account is the real value customer       deserves from the Bank. Completeness of data is the extent to which the expected attributes of data are provided. For       example, a customer data is considered as complete if all customer addresses, contact details and other information are       available and also the data of all customers is available. Consistency of data means that data across the enterprise       should be in synchronized with each other or the absence of data conflicts. An example of data in-consistency is a       credit card is cancelled, and inactive, but the card billing status shows due. The timeliness of data is extremely       important which depends on user expectation. Quality of data in the web portals can be analyzed using the survey       method. The survey has been made with the web users who are regular to use the online “The Hindu” web portal. | 
        
            | The scope of the study in this paper includes only the intrinsic and representational data quality categories of       “Science & Technology” column of „The Hindu? web portal. Table 2 shows the Data quality, its dimensions and its       definitions. | 
        
            | QUALITY ANALYSIS | 
        
            | Data Quality (DQ) is often defined as “fitness for use”, i.e., the ability of a collection of data to meet user       requirements [3, 14]. | 
        
            | This definition and the current view of assessing DQ, involve understanding DQ from the users point of view [15].       Newspapers can provide online versions, that are not mirror images of print versions, instead offer something extra       such as interactive features or information that could not fit in print version [1]. There are number of newspapers       available on internet some with general information and some papers are complete with archives. The Hindu newspaper       is one among the complete newspaper available on the internet via the web portal http://www.thehindu.com/.The online       web portal of this paper consists of many columns which covers various information every day. But the case study in       this paper has analyzed the data qualities like Intrinsic DQ, and Representational DQ in the „S&T? (Science &       Technology) column alone. | 
        
            | The “S&T” Column of the portal includes several sub columns like Agriculture, Energy & Environment, Gadgets,       Internet, Science and Technology. The survey has been done by feedback analysis using statistical tool. A questionnaire       has been framed and the feedback has been collected from the undergraduate and postgraduate Students, Research       scholars, Academicians of various disciplines and web users who go through this portal in a regular basis. | 
        
            | The questionnaire has been framed with 5 to 6 questions for each dimension. The web user has to enter their rating       percentage values in the specified columns. | 
        
            | Likewise more than 80 feedback forms collected and calculated the average of each dimension. Table 3 shows the part of       the attribute questionnaire. | 
        
            | INTRINSIC QUALITY | 
        
            | The Intrinsic DQ specifies the basic qualities of data like accuracy and timeliness. Accuracy ensure data are correct and       valid values, Timeliness refers to the information is up to date and the articles are useful to our work or life. Chart1       represents the Intrinsic DQ in which the accuracy is 80% and the timeliness is 90%. On an average, the intrinsic quality of       data, that?s accuracy and timeliness is measured as 85% from the feedback collected. | 
        
            | REPRESENTATIONAL DATA QUALITY | 
        
            | The Representational DQ specifies the way in which the data are presented or made available in the web portal. | 
        
            | The representational DQ includes content coverage, writing style, interactivity, layout, multimedia presentation,       navigation, organization and archive. These factors help the online web portal to present their information in a most       effective manner to the wide user. Chart2 represents the Representational DQ in which the data representational quality has       been observed through various factors. | 
        
            | From the chart2 it is observed that the navigation of data is very high as 86%, and the Layout, organization and archive       of the presentation of data are high and found to be 85%, 84 % and 85% with a very small difference of 1% among them       from the feedback collected. | 
        
            | Content coverage and interactivity are found to be 65% and 70%. Multimedia presentation is found to be a medium value       of 45%. | 
        
            | CONCLUSION | 
        
            | Understanding content and consumer preferences is unique, rather than asking consumers to describe what kind of news       and information they want and how they should be covered, this study measured online newspaper content and measured       consumer reaction. The study on the “S&T” column of “The Hindu” web portal have shown just the amount of presence       of Intrinsic and Representational Data qualities which is quantified by their Data quality dimensions as previously       mentioned in the data classifications section. Through quantifying the data quality dimensions, the study has been made       with the exact presence of intrinsic and representational data qualities. This paper has made a sample study to quantify the       Data qualities through their dimensions, so that importance can be given to areas in which a poor quantifying measure is       shown. Future study can lead to all the columns of the paper, identification of lacking data quality in the portal, suggestions       to improve the data quality can also be included . | 
        
            | Tables at a glance | 
        
            | 
                
                    
                        |  |  |  |  |  |  
                        | Table 1 | Table 2 | Table 3 | Table 4 | Table 5 |  | 
        
            |  | 
        
            | Figures at a glance | 
        
            | 
                
                    
                        |  |  |  
                        | Figure 1 | Figure 2 |  | 
        
            |  | 
        
            | References | 
        
            | 
                Chyi, H.I. & Lasorsa D., Access, Use and Preferences for Online Newspapers. Newspaper Research Journal, 1999,         20(4), 2-13.
 M. Angelica Caro,Coral Calero, Ismael Caballero, Mario Piattini., Data Quality In Web Applications: A State Of         The Art ,IADIS International Conference on WWW/Internet 2005, pp 364-368.
 C. Cappiello, C. Francalanci, and B. Pernici., Data quality assessment from the user´s perspective in International         Workshop on Information Quality in Information Systems, (IQIS2004). 2004. Paris, Francia: ACM. p. 68-73.
 Caro, C. Calero, H. Sahraoui, and M. Piattini, A Bayesian Network to Represent a Data Quality Model.         International Journal on Information Quality, 2007. Accepted for publication in the inaugural issue 2007.
 InduShobha N. Chengalur-Smith, Donald P. Ballou, Harold L. Pazer, The Impact of Data Quality Information on         Decision Making: An Exploratory Analysis. IEEE Transactions on Knowledge and Data Engineering 11(6): 853-864, 1999.
 Monica Bobrowski, Martina Marr, Daniel Yankelevich: A Homogeneous Framework to Measure Data Quality. In         MIT Conference on Information Quality (IQ), 115-124, 1999.
 Cappiello, C., et al., 2004. Data quality assessment from the user´s perspective. Proc. IQIS2004, pp: 68-73.
 Eppler, M. and Muenzenmayer, P.,2002. Measuring Information Quality in the Web Context: A Survey of Stateof-         the-Art Instruments and an Aplication Methodology. Proc. of ICIQ2002, pp: 187-196.
 Pernici, B. and Scannapieco, M.,2002. Data Quality in Web Information Systems. Proceeding of the 21st         International Conference on Conceptual Modeling, pp: 397-413.
 Chen, K & Yen, DC 2004, 'Improving the quality of online presence through interactivity', Information &         Management, vol. 42, No. 1, p. 217.
 M. Gertz, T. Ozsu, G. Saake, and K.-U. Sattler, Report on the Dagstuhl Seminar "Data Quality on the Web".         SIGMOD Record, 2004. vol. 33, No. 1: p. 127-132.
 P. Katerattanakul and K. Siau. Measuring Information Quality of Web Sites: Development of an Instrument. in         Proceeding of the 20th International Conference on Information System. 1999. p. 279-285.
 Caro, C. Calero, I. Caballero, and M. Piattini. Defining a Data Quality Model for Web Portals. in WISE2006, The         7th International Conference on Web Information Systems Engineering. 2006. Wuhan, China: Springer LNCS 4255. p.         363-374.
 D. Strong, Y. Lee, and R. Wang, Data Quality in Context. Communications of the ACM, 1997. Vol. 40, Nº 5: p.         103 -110.
 S.A. Knight and J.M. Burn, Developing a Framework for Assessing Information Quality on the World Wide Web.         Informing Science Journal, 2005. 8: p. 159-172.
 Mohamed Haneefa K and Shyma Nellikka, Content Analysis of Online English Newspapers in India , DESIDOC         Journal of Library & Information Technology, Vol. 30, No. 4, July2010
 |