TEXT CONTENT BASED WEB PAGE REFRESH POLICY
Internet is actively used for the exchange of information. Working of WWW starts with the entering of a URL in address bar or by clicking on a link. It navigates us to a given location. The data is stored in the web servers which serve as the backbone of the World Wide Web. People upload the web pages and updating the new web pages very frequently. There is a frequent change in the content of the web page. A crawler recursively keeps on adding new URLs to the database repository of a search engine. It might possible that after downloading a particular web page, the local copy of the page residing in the web repository of the web pages becomes obsolete as compared to the copy presented on the web. Users are often not only interested in the current web page contents but also in changes in it. Hence it becomes necessary to develop an efficient system which could detect these changes efficiently and in the minimum browsing time. Therefore there should be some provision to update the database at a regular interval. Once we decide to update a page it should be ensured that minimum resources are used. Various tools and services are also available which can be used to detect these changes. In this paper, by taking the advantage of previous work a new approach is discussed to derive certain parameters, which can help in deriving the fact that whether the data has changed or not.
Vidushi Singhal and Sachin Sharma
To read the full article Download Full Article