Get this book > Problems on Array: For Interviews and Competitive Programming
Reading time: 35 minutes
Sentiment Analysis is the application of analysing a text data and predict the emotion associated with the text. This is a challenging Natural Language Processing problem and there are several established approaches which we will go through.
Sentiment Analysis finds applications in customer reviews in many industries such as ECommerce, survey responses for betterment of delivery of service to customers.
In most cases, sentiments can be classified as positive, negative or neutral. It's scope can be any one of the following three types:
 Document level  sentiment analysis on an entire document
 Sentence level  sentiment analysis of a sentence
 Subsentence level  sentiment analysis of a subset of the whole sentence
Techniques
The techniques that can be used for Sentiment Analysis are:
 Lexicon based techniques:
 corpus based
 dictionary based
 Machine Learning based (like Neural Network based, SVM and others):
 Neural Network based Sentiment Analysis
 SVM based Sentiment Analysis
 Sentiment Analysis using Naive Bayes Classifier
 Maximum Entropy based Sentiment Analysis
 Sentiment Analysis using Bayesian Network
 Hybrid techniques (like pSenti and SAIL)
Let's discuss all the techniques in depth.
Lexiconbased Techniques
Lexicon, in literal terms, means the vocabulary of a person.
 Lexicon based techniques are unsupervised learning techniques.
 Under these, documents will be searched for postivie and negative terms.
 A predetermined dictionary will be used to classify words as positive or negative.
 The document will be initialized with a score, s, initially zero.
 Every positive word in the document will result in an increment in the score s.
 Every negative word, on the other hand, will result in a decrement in the overall score s.
 Finally, during the assessment of the score, s will be compared with a threshold value to finally deem the document as having positive or negative sentiments, often called polarity of the document.
These techniques can be broadly classfied into the following:

Corpusbased approach
It brings domain specificity to the dataset, thus, the words in the dataset will not only have a sentiment asoociated with it but also a context. 
Dictionarybased approach
Under this, a set of words are initially chosen, following which its synonyms and antonyms are found out to help grow the set. This process is repeated until a stable set is obtained.
Machine Learning based Techniques
 These can be said to be supervised learning techniques and can be usually said to be a classification problem
 Basically, this technique makes use of classification to determine whether the document is positive, negative or neutral.
 As is obvious, the classification model requires a training set to be fed to the model so that the model can learn the differentiating characteristics between positively and negatively classified documents.
 The steps involved are as follows 

Text Vectorization
The document vectors are calculated, based on term frequency, inverse document frequency.
Only those terms are considered which are present in a predefined dictionary i.e. words that actually provide positive or negative description 
Classification
For classification, different algorithms like Linear Regression, Naive Bayes can be applied.
The classifiers may be Decision Tree based, or Linear Classifiers like SVM and Neural Networks, or Rulebased Classifiers or Probabilistic Classifiers like Naive Bayes, Maximum Entropy or Bayesian Network.
Deep Learning and neural networks can also be employed for the purpose of sentiment analysis.
Some specific techniques include:
Algorithm  Reference 

Deep Convolutional Neural Networks  PDF on UNITI 
Support Vector Machine  PDF on ResearchGate 
Sentiment Analysis using Naive Bayes  PDF on ResearchGate 
Maximum Entropy  PDF on Stanford 
Bayesian Network  PDF on IOP 
Neural Network based Sentiment Analysis
 Neural networks, an integral part of Deep Learning, are modelled after the human brain
 Three phases  namely, the input, hidden and output  are used in a neural network, with weights associated with every node in a particular layer
 In Sentiment Analysis using Neural Networks, Word Embeddings are used
 Word embeddings bring the aspect of human understanding of language as opposed to only machinebased understanding
SVM based Sentiment Analysis
 SVM is a supervised technique, which can be used for both classification as well as regression
 Classification by SVM involves mapping of data points in space such that they can be easily separated by a line or a plane
 Preprocessing of data involves tokenization, i.e. splitting the text into tokens
 Feature vectors are created to enable representation of the text in an ndimensional plane
 It is especially useful when the test data is sparse, as classification becomes easier
Sentiment Analysis using Naive Bayes Classifier
 Naive Bayes classification is a probabilistic algorithm based on Bayes Theorem
 Classification is performed on the basis of probability of different attributes being associated with a particular class
 In Sentiment Analysis using Naive Bayes classifier, a basic word count is calculated for each word, with respect to positive as well as negative reviews in the training dataset
 This probability is eventually, used to make predictions
Maximum Entropy based Sentiment Analysis
 This algorithm is based on the Principle of Maximum Entropy
 It is a probabilistic model and aim of the classifier is to maximize the entropy of the classification system
 In Sentiment Analysis using Maximum Entropy Classifier, a bag of words model can be used, which is transformed to document vectors later
 It is similar to the Naive Bayes Classifier, except with a context associated with every word's probability of being in a particular class
 Thus, words are not treated independently as in the case of the Naive Bayes Classifier
Sentiment Analysis using Bayesian Network
 It is a probabilistic graphbased classification algorithm, primarily used for decision problems
 Each node in the Bayesian Network represents a random variable, and every edge in the acyclic graph represents the relationship between the nodes
 In Sentiment Analysis using Bayesian Network, dependencies between words are captured in the form of a graph
 It is useful when the training dataset is large
 It is not very commonly used for Sentiment Analysis, with a great scope for research still
Hybrid Techniques
A combination of both Lexiconbased and Machine Learningbased techniques have proved to be more efficient at sentiment analysis than the other two used separately.
Example: pSenti, SAIL
A Comparison
 Supervised Machine Learning techniques are usually more effective as compared to Lexiconbased techniques.
 In Lexiconbased techniques polarity is determined by a predefined dictionary, thus size of dictionary can significantly affect the token matching.
 In case of less feature representations, machine learning based techniques fail in comparison to lexical techniques.
 Lexiconbased techniques are known to be robust and can be enhanced by using multiple sources of information.
References
 Aliaksei Severyn, Alessandro Moschitti. Twitter Sentiment Analysis with Deep Convolutional Neural Networks
 Nurulhuda Zainuddin, Ali Selamat. (2014). Sentiment Analysis Using Support Vector Machine
 Christos Troussas, Maria Virvou, Kurt Junshean Espinosa, Kevin Llaguno, Jaime Caro. Sentiment analysis of Facebook statuses using Naive Bayes classifier for language
learning  Nipun Mehra, Shashikant Khandelwal,Priyank Patel. Sentiment Identification Using Maximum Entropy Analysis of
Movie Reviews  Muhammad Surya Asriadie, Mohamad Syahrul Mubarok Adiwijaya. (2018). Classifying emotion in Twitter using Bayesian
network  Maite Taboada, Julian Brooke, Milan Tofiloski, Kimberly Voll, Manfred Stede. (2010). LexiconBased Methods for Sentiment Analysis
 Michelle Annett, Grzegorz Kondrak. (2008). A Comparison of Sentiment Analysis Techniques: Polarizing Movie Blogs
 S.M. Vohra, J.B. Teraiya. (2013). A Comparative Study of Sentiment Analysis Techniques
 Walaa Medhata, Ahmed Hassan, Hoda Korashy. (2014). Sentiment analysis algorithms and applications: A survey