Reading time: 35 minutes
Sentiment Analysis is the application of analysing a text data and predict the emotion associated with the text. This is a challenging Natural Language Processing problem and there are several established approaches which we will go through.
Sentiment Analysis finds applications in customer reviews in many industries such as E-Commerce, survey responses for betterment of delivery of service to customers.
In most cases, sentiments can be classified as positive, negative or neutral. It's scope can be any one of the following three types:
- Document level - sentiment analysis on an entire document
- Sentence level - sentiment analysis of a sentence
- Sub-sentence level - sentiment analysis of a subset of the whole sentence
The techniques that can be used for Sentiment Analysis are:
- Lexicon based techniques:
- corpus based
- dictionary based
- Machine Learning based (like Neural Network based, SVM and others):
- Neural Network based Sentiment Analysis
- SVM based Sentiment Analysis
- Sentiment Analysis using Naive Bayes Classifier
- Maximum Entropy based Sentiment Analysis
- Sentiment Analysis using Bayesian Network
- Hybrid techniques (like pSenti and SAIL)
Let's discuss all the techniques in depth.
Lexicon, in literal terms, means the vocabulary of a person.
- Lexicon based techniques are unsupervised learning techniques.
- Under these, documents will be searched for postivie and negative terms.
- A predetermined dictionary will be used to classify words as positive or negative.
- The document will be initialized with a score, s, initially zero.
- Every positive word in the document will result in an increment in the score s.
- Every negative word, on the other hand, will result in a decrement in the overall score s.
- Finally, during the assessment of the score, s will be compared with a threshold value to finally deem the document as having positive or negative sentiments, often called polarity of the document.
These techniques can be broadly classfied into the following:
It brings domain specificity to the dataset, thus, the words in the dataset will not only have a sentiment asoociated with it but also a context.
Under this, a set of words are initially chosen, following which its synonyms and antonyms are found out to help grow the set. This process is repeated until a stable set is obtained.
Machine Learning based Techniques
- These can be said to be supervised learning techniques and can be usually said to be a classification problem
- Basically, this technique makes use of classification to determine whether the document is positive, negative or neutral.
- As is obvious, the classification model requires a training set to be fed to the model so that the model can learn the differentiating characteristics between positively and negatively classified documents.
- The steps involved are as follows -
The document vectors are calculated, based on term frequency, inverse document frequency.
Only those terms are considered which are present in a predefined dictionary i.e. words that actually provide positive or negative description
For classification, different algorithms like Linear Regression, Naive Bayes can be applied.
The classifiers may be Decision Tree based, or Linear Classifiers like SVM and Neural Networks, or Rule-based Classifiers or Probabilistic Classifiers like Naive Bayes, Maximum Entropy or Bayesian Network.
Deep Learning and neural networks can also be employed for the purpose of sentiment analysis.
Some specific techniques include:
|Deep Convolutional Neural Networks||PDF on UNITI|
|Support Vector Machine||PDF on ResearchGate|
|Sentiment Analysis using Naive Bayes||PDF on ResearchGate|
|Maximum Entropy||PDF on Stanford|
|Bayesian Network||PDF on IOP|
Neural Network based Sentiment Analysis
- Neural networks, an integral part of Deep Learning, are modelled after the human brain
- Three phases - namely, the input, hidden and output - are used in a neural network, with weights associated with every node in a particular layer
- In Sentiment Analysis using Neural Networks, Word Embeddings are used
- Word embeddings bring the aspect of human understanding of language as opposed to only machine-based understanding
SVM based Sentiment Analysis
- SVM is a supervised technique, which can be used for both classification as well as regression
- Classification by SVM involves mapping of data points in space such that they can be easily separated by a line or a plane
- Preprocessing of data involves tokenization, i.e. splitting the text into tokens
- Feature vectors are created to enable representation of the text in an n-dimensional plane
- It is especially useful when the test data is sparse, as classification becomes easier
Sentiment Analysis using Naive Bayes Classifier
- Naive Bayes classification is a probabilistic algorithm based on Bayes Theorem
- Classification is performed on the basis of probability of different attributes being associated with a particular class
- In Sentiment Analysis using Naive Bayes classifier, a basic word count is calculated for each word, with respect to positive as well as negative reviews in the training dataset
- This probability is eventually, used to make predictions
Maximum Entropy based Sentiment Analysis
- This algorithm is based on the Principle of Maximum Entropy
- It is a probabilistic model and aim of the classifier is to maximize the entropy of the classification system
- In Sentiment Analysis using Maximum Entropy Classifier, a bag of words model can be used, which is transformed to document vectors later
- It is similar to the Naive Bayes Classifier, except with a context associated with every word's probability of being in a particular class
- Thus, words are not treated independently as in the case of the Naive Bayes Classifier
Sentiment Analysis using Bayesian Network
- It is a probabilistic graph-based classification algorithm, primarily used for decision problems
- Each node in the Bayesian Network represents a random variable, and every edge in the acyclic graph represents the relationship between the nodes
- In Sentiment Analysis using Bayesian Network, dependencies between words are captured in the form of a graph
- It is useful when the training dataset is large
- It is not very commonly used for Sentiment Analysis, with a great scope for research still
A combination of both Lexicon-based and Machine Learning-based techniques have proved to be more efficient at sentiment analysis than the other two used separately.
Example: pSenti, SAIL
- Supervised Machine Learning techniques are usually more effective as compared to Lexicon-based techniques.
- In Lexicon-based techniques polarity is determined by a predefined dictionary, thus size of dictionary can significantly affect the token matching.
- In case of less feature representations, machine learning based techniques fail in comparison to lexical techniques.
- Lexicon-based techniques are known to be robust and can be enhanced by using multiple sources of information.
- Aliaksei Severyn, Alessandro Moschitti. Twitter Sentiment Analysis with Deep Convolutional Neural Networks
- Nurulhuda Zainuddin, Ali Selamat. (2014). Sentiment Analysis Using Support Vector Machine
- Christos Troussas, Maria Virvou, Kurt Junshean Espinosa, Kevin Llaguno, Jaime Caro. Sentiment analysis of Facebook statuses using Naive Bayes classifier for language
- Nipun Mehra, Shashikant Khandelwal,Priyank Patel. Sentiment Identification Using Maximum Entropy Analysis of
- Muhammad Surya Asriadie, Mohamad Syahrul Mubarok Adiwijaya. (2018). Classifying emotion in Twitter using Bayesian
- Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, Manfred Stede. (2010). Lexicon-Based Methods for Sentiment Analysis
- Michelle Annett, Grzegorz Kondrak. (2008). A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs
- S.M. Vohra, J.B. Teraiya. (2013). A Comparative Study of Sentiment Analysis Techniques
- Walaa Medhata, Ahmed Hassan, Hoda Korashy. (2014). Sentiment analysis algorithms and applications: A survey