European Journal of Computer Science and Information Technology (EJCSIT)

EA Journals

Author Identification Based on NLP

Abstract

The amount of textual content is increasing exponentially, especially through the publication of articles; the issue is further complicated by the increase in anonymous textual data. Researchers are looking for alternative methods to predict the author of an unknown text, which is called Author Identification. In this research, the study is performed with Bag of Words (BOW) and Latent Semantic Analysis (LSA) features. The “All the news” dataset on Kaggle is used for experimentation and to compare BOW and LSA for the best performance in the task of author identification. Support vector machine, random forest, Bidirectional Encoder Representations from Transformers (BERT), and logistic regression classification algorithms are used for author prediction. For first scope that have 20 authors, for each author 100 articles, the greatest accuracy is seen from logistic regression using bag-of-words, followed by random forest, also using bag-of-words; in all algorithms, bag-of-words scored better than LSA. Ultimately, BERT model was applied in this research and achieved 70.33% accuracy performance. For second scope that increase the number of articles till 500 articles per author and decrees the number of authors till 10, the BOW achieves better performance results with the logistic regression algorithm at 93.86%. Moreover, the best accuracy performance is with LR at 94.9% when merged the feature together and it proved that it is better than applied BOW and LSA individual, with an improvement by almost 0.1% comparing with BOW only. Ultimately, BRET achieved result by 86.56% accuracy performance and 0.51 log los.

Keywords: Analysis, Identification, NLP, author, data analytics

cc logo

This work by European American Journals is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Unported License

 

Recent Publications

Email ID: editor.ejcsit@ea-journals.org
Impact Factor: 7.80
Print ISSN: 2054-0957
Online ISSN: 2054-0965
DOI: https://doi.org/10.37745/ejcsit.2013

Author Guidelines
Submit Papers
Review Status

 

Scroll to Top

Don't miss any Call For Paper update from EA Journals

Fill up the form below and get notified everytime we call for new submissions for our journals.