NLP: Sentiment Analysis with LSTM
Introduction to Sentiment Analysis
Sentiment Analysis, a common application in Natural Language Processing (NLP), is an interesting task aimed at extracting emotional content from text and categorizing it. It involves analyzing, processing, summarizing, and inferring subjective text with emotional tones.
This article will focus on sentiment polarity analysis within sentiment analysis. Sentiment polarity analysis refers to categorizing text into positive, negative, or neutral sentiments. In most applications, sentiments are classified into two categories. For example, words like “like” and “dislike” represent different emotional orientations.
This article will delve into using the LSTM model, a deep learning model, for sentiment analysis on Chinese text.
Text Introduction and Corpus Analysis
We’ll use comments about a product from an e-commerce website as our corpus (corpus.csv), which can be downloaded from this link. The dataset consists of 4310 comment entries, categorized as “positive” and “negative”. Here are a few entries from the dataset:
1 |
|
Following this, we perform a simple analysis on the corpus:
- Distribution of sentiments in the dataset.
- Distribution of comment sentence lengths in the dataset.
We use the following Python script to analyze the sentiment distribution and the length distribution of comment sentences.
1 |
|
The output result would be:
1 |
|
Using LSTM Model
Next, we employ the LSTM (Long Short-Term Memory) model from deep learning for sentiment analysis on the provided dataset. The complete Python code for this process is provided:
1 |
|
For the aforementioned model, it was trained 5 times with a training-to-testing set ratio of 9:1, yielding the following output:
1 |
|
The model achieved an accuracy of over 95% on the training set and over 90% on the testing set, indicating a fairly good performance.
Model Prediction
Subsequently, we use the trained model to predict the sentiment polarity of new data. The Python code for sentiment prediction is as follows:
1 |
|
The output result would be:
1 |
|
Let’s try testing a few other comments:
1 |
|
Thank you for reading!