Noisy elimination for web mining based on style tree approach

Deepa.R1 , Nirmala Devi.R2

Authors

Deepa.R1 , Nirmala Devi.R2 Department of Computer Science, Nandha Arts and Science College, Erode, Tamil Nadu, India

Abstract

The Main Objective is to detect noise data in web documents and eliminate the noise data based on Style Tree approach. The Noise Elimination system is designed to eliminate noise and to identify the informative content section from the Web documents. In a given Web site, noisy blocks usually share some common contents and presentation styles, while the main content blocks of the pages are often diverse in their actual contents and presentation styles. Based on this observation, we propose a tree structure, called Style Tree, to capture the common presentation styles and the actual contents of the pages in a given Web site. By sampling the pages of the site, a Style Tree can be built for the site, which we call the Site Style Tree (SST). We then introduce an information based measure to determine which parts of the SST represent noises and which parts represent the main contents of the site. The SST is employed to detect and eliminate noises in any Web page of the site by mapping this page to the SST. The proposed technique is evaluated with two data mining technique, one is clustering and another one is classification. The K-Means algorithm is used to group the tree elements.
Keywords: Noise detection, Noise elimination, Clustering, Classification.

Noisy elimination for web mining based on style tree approach

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Developed By