LIMTM: A FRAMEWORK FOR ASSIMILATING LINK BASED IMPORTANCE INTO SEMANTICALLY COHERENT CLUSTERS OF CORRELATED WORDS
Abstract
As more information becomes available, it is getting harder and harder to find what we are looking for like finding a needle in the haystack. In today’s world where most of the information is electronically stored new tools which help to organize, search and understand information are need of the hour. Topic can be described as “a recurring pattern of co-occurring words”. Topic models are used to discover hidden topic based patterns. Using those discovered topics collection of documents can be annotated. Using those annotations documents can be organized, understood, summarized and searched. Topic modeling has become a well known text mining method and is widely used in document navigation, clustering, classification and information retrieval. Given a set of documents, the goal of topic modeling is to discover semantically coherent clusters of correlated words known as topics, which can be further used to represent and summarize the content of documents. By using topic modeling, documents can be modeled as multinomial distributions over topics instead of those over words. Topics can serve as better features of documents than words because of its low dimension and good semantic. Interpretability Topic modeling has become a widely used tool for document management. However, there are few topic models distinguishing the importance of documents on different topics. A framework LIMTM (Link based Importance into Topic Modeling) is used to incorporate link based importance into topic modeling. Specifically, ranking methods are used to compute the topical importance of documents.
Keywords: Text Corpus, Topic Modeling, Link based Importance, Ranking, and Log Likelihood
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Engineering Technology and Computer Research (IJETCR) by Articles is licensed under a Creative Commons Attribution 4.0 International License.