BLOCKCHAIN ARTICLES TEXT MINING
- Sanjit_3282
- Nov 1, 2018
- 2 min read
Blockchain technology has been the talk of the town for the past few years. The aim of project is to perform text mining on newspaper articles related to Blockchain technology and find what people, what different industries are talking about Blockchain, what’s their attitude and sentiments attached to it?

We worked on the Blockchain Technology articles published on The Wall Street Journal, The Economist and The Chicago Tribune. We scraped over 350+ articles using Python from the websites of these publications. The project was mainly divided into two main parts. First, Sentiment analysis using NLTK corpus to compare authors’ attitudes and second, Topic modeling using LDA to analyze article contents.
We used VADER sentiment analysis, to rate each word in the article a positive, negative, neutral and compound scores, to calculate absolute polarity score of the article overall in the end and get idea of the attitude of the article. To dig more deeper into the content and know the topic of the article, we used Latent Dirichlet Allocation (LDA). Using this unsupervised machine learning method, we got an estimate of how much each topic contributes to each document and how much each word contributes to each topic. By doing this, we divided the articles into several topic clusters.
To conclude, topic related to bitcoin, technology and business were talked most in Blockchain technology articles. Sentiment analysis showed a very weak positive sentiment associated with these articles. Sentiment analysis need better algorithms or word dictionaries to accurately identify the polarity associated with these articles, to get results more consistent with human thoughts.
Below is a video of the presentation report for your reference. This was a team project and my team members were Nakul Kaura, Yuyi Zhang and Yuxuan Wang.
Comments