Survey On Term Weighting Using Coherent Clustering In Topic Modelling

  • Unique Paper ID: 148352
  • Volume: 6
  • Issue: 1
  • PageNo: 402-405
  • Abstract:
  • Topic models often produce uncountable topics that are filled with noisy words. The reason is that words in topic modelling have same weights. More frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a integrated form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns unmeaning words lower weights and informative words higher weights, leading to more coherent topics during topic modelling inference. We apply CEW to DMM and LDA, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Exploratory results show that weighting words can effectively improve the topic modelling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.
add_icon3email to a friend

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{148352,
        author = {Manisha N. Amnerkar and Ashwini Tikle},
        title = {Survey On Term Weighting Using Coherent Clustering In Topic Modelling},
        journal = {International Journal of Innovative Research in Technology},
        year = {},
        volume = {6},
        number = {1},
        pages = {402-405},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=148352},
        abstract = {Topic models often produce uncountable topics that are filled with noisy words. The reason is that words in topic modelling have same weights. More frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a integrated form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns unmeaning words lower weights and informative words higher weights, leading to more coherent topics during topic modelling inference. We apply CEW to DMM and LDA, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Exploratory results show that weighting words can effectively improve the topic modelling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.},
        keywords = {Topic modeling, Term weighting, Informative word, Conditional entropy.},
        month = {},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 6
  • Issue: 1
  • PageNo: 402-405

Survey On Term Weighting Using Coherent Clustering In Topic Modelling

Related Articles

Join Our IPN

IJIRT Partner Network

Submit your research paper and those of your network (friends, colleagues, or peers) through your IPN account, and receive 800 INR for each paper that gets published.

Join Now arrowright18x

Recent Conferences

NCSEM 2024

National Conference on Sustainable Engineering and Management - 2024 Last Date: 15th March 2024

Submit inquiry arrowright18x