Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Introduction

Techniques

Lexicon-based Techniques

Machine Learning-based Techniques

Neural Network based Sentiment Analysis

SVM based Sentiment Analysis

Sentiment Analysis using Naive Bayes Classifier

Maximum Entropy based Sentiment Analysis

Sentiment Analysis using Bayesian Network

Hybrid Techniques

A Comparison

References

Reading time: 35 minutes

Sentiment Analysis is the application of analysing a text data and predict the emotion associated with the text. This is a challenging Natural Language Processing problem and there are several established approaches which we will go through.

Sentiment Analysis finds applications in customer reviews in many industries such as E-Commerce, survey responses for betterment of delivery of service to customers.

In most cases, sentiments can be classified as positive, negative or neutral. It's scope can be any one of the following three types:

Document level - sentiment analysis on an entire document
Sentence level - sentiment analysis of a sentence
Sub-sentence level - sentiment analysis of a subset of the whole sentence

Techniques

The techniques that can be used for Sentiment Analysis are:

Lexicon based techniques:
- corpus based
- dictionary based
Machine Learning based (like Neural Network based, SVM and others):
- Neural Network based Sentiment Analysis
- SVM based Sentiment Analysis
- Sentiment Analysis using Naive Bayes Classifier
- Maximum Entropy based Sentiment Analysis
- Sentiment Analysis using Bayesian Network
Hybrid techniques (like pSenti and SAIL)

Let's discuss all the techniques in depth.

Lexicon-based Techniques

Lexicon, in literal terms, means the vocabulary of a person.

Lexicon based techniques are unsupervised learning techniques.
Under these, documents will be searched for postivie and negative terms.
A predetermined dictionary will be used to classify words as positive or negative.
The document will be initialized with a score, s, initially zero.
Every positive word in the document will result in an increment in the score s.
Every negative word, on the other hand, will result in a decrement in the overall score s.
Finally, during the assessment of the score, s will be compared with a threshold value to finally deem the document as having positive or negative sentiments, often called polarity of the document.

These techniques can be broadly classfied into the following:

Corpus-based approach
It brings domain specificity to the dataset, thus, the words in the dataset will not only have a sentiment asoociated with it but also a context.
Dictionary-based approach
Under this, a set of words are initially chosen, following which its synonyms and antonyms are found out to help grow the set. This process is repeated until a stable set is obtained.

Machine Learning based Techniques

These can be said to be supervised learning techniques and can be usually said to be a classification problem
Basically, this technique makes use of classification to determine whether the document is positive, negative or neutral.
As is obvious, the classification model requires a training set to be fed to the model so that the model can learn the differentiating characteristics between positively and negatively classified documents.
The steps involved are as follows -

Text Vectorization
The document vectors are calculated, based on term frequency, inverse document frequency.
Only those terms are considered which are present in a predefined dictionary i.e. words that actually provide positive or negative description
Classification
For classification, different algorithms like Linear Regression, Naive Bayes can be applied.
The classifiers may be Decision Tree based, or Linear Classifiers like SVM and Neural Networks, or Rule-based Classifiers or Probabilistic Classifiers like Naive Bayes, Maximum Entropy or Bayesian Network.

Deep Learning and neural networks can also be employed for the purpose of sentiment analysis.

Some specific techniques include:

Algorithm	Reference
Deep Convolutional Neural Networks	PDF on UNITI
Support Vector Machine	PDF on ResearchGate
Sentiment Analysis using Naive Bayes	PDF on ResearchGate
Maximum Entropy	PDF on Stanford
Bayesian Network	PDF on IOP

Neural Network based Sentiment Analysis

Neural networks, an integral part of Deep Learning, are modelled after the human brain
Three phases - namely, the input, hidden and output - are used in a neural network, with weights associated with every node in a particular layer
In Sentiment Analysis using Neural Networks, Word Embeddings are used
Word embeddings bring the aspect of human understanding of language as opposed to only machine-based understanding

SVM based Sentiment Analysis

SVM is a supervised technique, which can be used for both classification as well as regression
Classification by SVM involves mapping of data points in space such that they can be easily separated by a line or a plane
Preprocessing of data involves tokenization, i.e. splitting the text into tokens
Feature vectors are created to enable representation of the text in an n-dimensional plane
It is especially useful when the test data is sparse, as classification becomes easier

Sentiment Analysis using Naive Bayes Classifier

Naive Bayes classification is a probabilistic algorithm based on Bayes Theorem
Classification is performed on the basis of probability of different attributes being associated with a particular class
In Sentiment Analysis using Naive Bayes classifier, a basic word count is calculated for each word, with respect to positive as well as negative reviews in the training dataset
This probability is eventually, used to make predictions

Maximum Entropy based Sentiment Analysis

This algorithm is based on the Principle of Maximum Entropy
It is a probabilistic model and aim of the classifier is to maximize the entropy of the classification system
In Sentiment Analysis using Maximum Entropy Classifier, a bag of words model can be used, which is transformed to document vectors later
It is similar to the Naive Bayes Classifier, except with a context associated with every word's probability of being in a particular class
Thus, words are not treated independently as in the case of the Naive Bayes Classifier

Sentiment Analysis using Bayesian Network

It is a probabilistic graph-based classification algorithm, primarily used for decision problems
Each node in the Bayesian Network represents a random variable, and every edge in the acyclic graph represents the relationship between the nodes
In Sentiment Analysis using Bayesian Network, dependencies between words are captured in the form of a graph
It is useful when the training dataset is large
It is not very commonly used for Sentiment Analysis, with a great scope for research still

Hybrid Techniques

A combination of both Lexicon-based and Machine Learning-based techniques have proved to be more efficient at sentiment analysis than the other two used separately.

Example: pSenti, SAIL

A Comparison

Supervised Machine Learning techniques are usually more effective as compared to Lexicon-based techniques.
In Lexicon-based techniques polarity is determined by a predefined dictionary, thus size of dictionary can significantly affect the token matching.
In case of less feature representations, machine learning based techniques fail in comparison to lexical techniques.
Lexicon-based techniques are known to be robust and can be enhanced by using multiple sources of information.

References

Aliaksei Severyn, Alessandro Moschitti. Twitter Sentiment Analysis with Deep Convolutional Neural Networks
Nurulhuda Zainuddin, Ali Selamat. (2014). Sentiment Analysis Using Support Vector Machine
Christos Troussas, Maria Virvou, Kurt Junshean Espinosa, Kevin Llaguno, Jaime Caro. Sentiment analysis of Facebook statuses using Naive Bayes classifier for language
learning
Nipun Mehra, Shashikant Khandelwal,Priyank Patel. Sentiment Identification Using Maximum Entropy Analysis of
Movie Reviews
Muhammad Surya Asriadie, Mohamad Syahrul Mubarok Adiwijaya. (2018). Classifying emotion in Twitter using Bayesian
network
Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, Manfred Stede. (2010). Lexicon-Based Methods for Sentiment Analysis
Michelle Annett, Grzegorz Kondrak. (2008). A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs
S.M. Vohra, J.B. Teraiya. (2013). A Comparative Study of Sentiment Analysis Techniques
Walaa Medhata, Ahmed Hassan, Hoda Korashy. (2014). Sentiment analysis algorithms and applications: A survey

Sentiment Analysis Techniques

Machine Learning (ML) Natural Language Processing (NLP)

Techniques

Lexicon-based Techniques

Machine Learning based Techniques

Neural Network based Sentiment Analysis

SVM based Sentiment Analysis

Sentiment Analysis using Naive Bayes Classifier

Maximum Entropy based Sentiment Analysis

Sentiment Analysis using Bayesian Network

Hybrid Techniques

A Comparison

References

Topic Modeling using Non Negative Matrix Factorization (NMF)

Generate all combinations taking one element from each list in Python