Machine Learning Approach for Sentiment Analysis


Reading time: 30 minutes | Coding time: 10 minutes

The Lexical methods of Sentiment Analysis, even though easy to understand and implement, are not proven to be very accurate. Thus, we discuss the Machine Learning approach for Sentiment Analysis, focusing on using Convolutional Neural Networks for the problem of Classification into positive and negative sentiments or Sentiment Analysis.

This method is especially useful when contextual information is scarce, for example, in social media where the content is less.
While CNNs are known to have revolutionized the Image Classification problem, they can be applied to Text Processing also.

Convolutional Neural Networks

Before we dive into Sentiment Analysis, it is useful to know CNNs and the different layers in it.
CNNs are composed of the following layers -

  • One or more Convolutional Layers
  • One or more ReLU (Rectified Linear Unit) Layer
  • One or more Pooling Layers
  • Fully Connected Layers

For text classification, the first hidden layer can be an Embedding layer, which takes input as Tokens, and converts integers to vectors, thus giving a vector representation of the words.

Some Important Points to note -

  • The output of each layer is fed as input to the next layer.
  • The Fully Connected Layer outputs the final vector, whose dimension is defined by the number of classes for classification, which in our case is 2 - positive and negative.

Algorithm

Step 1

  • The first step is the same for both Lexical methods, and any Machine Learning methods.
  • It involves Data Preprocessing, after having imported the data
  • Some common steps required are - removal of punctuation, stopwords, etc.
  • Tokenizing is performed to convert the sentences to more easier forms of data structures like Python lists.

Step 2

  • Padding is done in order to determine a maximum length of the input to the Convolutional Neural Network.
  • The concept of Padding is related to adding 0s to make the length of all input sentences or vectors uniform.

For example -
If the maximum length of an input vector is 10, then a vector [1, 136] will be transformed to [0, 0, 0, 0, 0, 0, 0, 0, 1, 136] after padding operation has been performed.

Note: This kind of preprocessing, in which 0s are added towards the beginning is true for Keras.

Step 3

  • Word Embeddings, which is a representation of the data in a multidimensional space depending on similarity and dissimilarity between words, is created.
  • This is primarily used for dimensionality reduction.
  • This can be done using Keras, or Word2Vec.
  • Keras provides an Embedding Hidden Layer to perform this task

Step 4

  • The Word Embeddings can be converted to an Embeddings Matrix before being fed as input to the CNN.
  • The Word Embeddings matrix output from the Embedding layer is the input to the Convolutional Layer, which performs the mathematical operation of Convolution, in a similar manner as in case of Images.
  • The filter sizes that perform Convolution can vary, and will accordingly output matrices of varying length. Multiple filters can be used.
  • The fully connected layer will finally give the output vector of size 2 (Binary CLassification).

Example

Let us consider the following 2 reviews for a movie:

s = ["This is a good movie", "Bad plot"]

Word Tokenization and Padding

First, each word in the sentence will be split:

list1 = [['This', 'is', 'a', 'good', 'movie'], ['Bad', 'plot']]

For tokenization, several approaches can be used, inclusing one-hot encoding or simply using a hash function or the Tokenizer API provided in keras. Let us assume after tokenization the sentence looks like the following:

list2 = [[44, 2, 12, 7, 18], [11, 65]]

Since, the lengths of both lists is unequal, it is necessary to add padding to the lists.

You can define the maximum length
In keras, the output will look as follows:

list3 = [[44, 2, 12, 7, 18], [0, 0, 0, 11, 65]

Word Embeddings

  • The Embedding Layer in keras will transform the above lists into a multidimensional space.
  • It can be created by specifying the unique vocabulary, the number of words in each input and the output vector size.

Since, there are 7 unique words in our example, our output from the Embedding Layer will be like a Hash Table with 7 rows, the index to each row being the token value from the lists, and the value stored there being the Word Embedding Vector.

CNN

Finally, the Embedding Matrix is input to the CNN's first layer, where convolution will be performed.

Code Demonstration

  1. Tokenization and Padding
  • Tokenization can be done either with Tokenizer in keras or nltk
  • For Padding, declare a variable MAX_LEN with the maximum number of words in every vector.
from keras import Tokenizer
from nltk import word_tokenize
tokenizer = Tokenizer(num_words=None, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n', lower=True, split=' ', char_level=False, oov_token=None, document_count=0)
tokenizer.fit_on_texts(data)
padded_data = pad_sequences(data, maxlen = MAX_LEN)
  1. Word Embeddings can be created either with Word2Vec or GloVe.
  • One such example is given here.
from glove import Corpus, Glove
corpus = Corpus()
corpus.fit(lines, window=10) 
glove = Glove(no_components=5, learning_rate=0.05)
glove.fit(corpus.matrix, epochs=30, no_threads=4, verbose=True)
glove.add_dictionary(corpus.dictionary)
glove.save('glove.model')
  1. Finally, the embeddings are passed to the Convolutional Neural Network, to train the model. Several implementations of CNN are available, depending on the number and kind of layers chosen for the neural network.
    An implementation of the same can be found here which is based on Dos Santoss and Gatti M's work [1].
  • The architecture uses 2 Convolutional Layers, one which extracts features on a character-level and the other applies a matrix-vector operation to extract local features.
  • The first layer acts as an Embedding Layer.
  • Two additional usual Neural Network layers are finally used to compute a Sentiment score for each input.

References

  1. Dos Santos, C., & Gatti, M. (2014, August). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 69-78).

Learn more: