Natural Language Processing (NLP)

Different core topics in NLP (with Python NLTK library code)

In this, we have covered different NLP tasks/ topics such as Tokenization of Sentences and Words, Stemming, Lemmatization, POS Tagging, Named Entity Relationship and more.

Machine Learning (ML)

XLNet, RoBERTa, ALBERT models for Natural Language Processing (NLP)

We have explored some advanced NLP models such as XLNet, RoBERTa and ALBERT and will compare to see how these models are different from the fundamental model i.e BERT.

Machine Learning (ML)

LSTM & BERT models for Natural Language Processing (NLP)

The fundamental NLP model that is used initially is LSTM model but because of its drawbacks BERT became the favored model for the NLP tasks.

Machine Learning (ML)

The Idea of Indexing in NLP for Information Retrieval

We have explored the fundamental ideas for Information Retrieval that is Indexing Data. We have covered various types of indexes like Term document incidence matrix, Inverted index, boolean queries, dynamic and distributed indexing, distributed indexing and Dynamic Index.

Machine Learning (ML)

Heaps' law in NLP for Frequency of Words

Heap's Law in NLP is a relation between the number of unique words to the total number of words in a document. It is, also, known as Herdan's law.

Machine Learning (ML)

Zipf's Law in NLP

According to Zipf's law, the frequency of a given word is dependent on the inverse of it's rank . Zipf's law is one of the many important laws that plays a significant part in natural language processing.

Machine Learning (ML)

Byte Pair Encoding for Natural Language Processing (NLP)

Byte Pair Encoding is originally a compression algorithm that was adapted for NLP usage. Byte Pair Encoding comes in handy for handling the vocabulary issue through a bottom-up process.

Machine Learning (ML)

A Deep Learning Approach for Native Language Identification (NLI)

Native language identification (NLI) is the task of determining an author's native language based only on their writings or speeches in a second language. In this article, we will implement a model to identify native language of the author.

Machine Learning (ML)

Complete Guide on different Spell Correction techniques in NLP

This is the complete Guide on different Spell Correction techniques in Natural Language Processing (NLP) where we have explored approximate string matching techniques, coarse search, fine search, symspell, Seq2Seq along with code demonstration.

Machine Learning (ML)

Different Word Representations

We have discussed the different word representations such as distributional representation, clustering based representation and distributed representation with several sub-types for each representation.

Machine Learning (ML)

Topic Modeling using Non Negative Matrix Factorization (NMF)

Non-Negative Matrix Factorization is a statistical method to reduce the dimension of the input corpora. It uses factor analysis method to provide comparatively less weightage to the words with less coherence.

Machine Learning (ML)

Sentiment Analysis Techniques

Sentiment Analysis is the application of analyzing a text data and predict the emotion associated with it. This is a challenging Natural Language Processing problem and there are several established approaches which we will go through.

Machine Learning (ML)

Text Summarization using RNN

Encoder Decoder RNN (Recurrent neural network) model is used in order to overcome all the limits faced by the NLP for text summarization such as getting a short and accurate summary.

Machine Learning (ML)

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) is used as a topic modelling technique that is it can classify text in a document to a particular topic. It uses Dirichlet distribution to find topics for each document model

Machine Learning (ML)

Topic Modelling Techniques in NLP

Topic modelling is an algorithm for extracting the topic or topics for a collection of documents. We explored different techniques like LDA, NMF, LSA, PLDA and PAM.

Machine Learning (ML)

Implement Document Clustering using K Means in Python

In this article, we discuss the implementation of concepts like TF IDF, document similarity and K Means and created a demo of document clustering in Python

Machine Learning (ML)

TextRank for Text Summarization

TextRank is a text summarization technique which is used in Natural Language Processing to generate Document Summaries. It uses an extractive approach and is an unsupervised graph-based text summarization technique based on PageRank.

Machine Learning (ML)

Text classification using K Nearest Neighbors (KNN)

In this article, we will demonstrate how we can use K-Nearest Neighbors (KNN) algorithm for classifying input text into different categories. We used 20 news groups for a demo.

Machine Learning (ML)

PageRank

PageRank is an algorithm to assign weights to nodes on a graph based on the graph structure and is largely used in Google Search Engine being developed by Larry Page

Machine Learning (ML)

Language Identification Techniques

In this article, we will understand the different techniques for language identification which involves two steps namely language modelling and classification

Machine Learning (ML)

Text classification using Naive Bayes classifier

In this article, we have explored how we can classify text into different categories using Naive Bayes classifier. We have used the News20 dataset and developed the demo in Python.

Machine Learning (ML)

LexRank method for Text Summarization

LexRank method for text summarization is another child method to PageRank method similar to TextRank. It uses a graph based approach for text summarization

Machine Learning (ML)

Edmundson Heuristic Method for text summarization

Edmundson Heuristic Method proposes the use of a subjectively weighted combination of features as opposed to traditionally used feature weights generated using a corpus

Machine Learning (ML)

Applying Naive Bayes classifier on TF-IDF Vectorized Matrix

We will use Naive Bayes classifier on IF-IDF vectorized matrix for text classification task. We use the ImDb Movies Reviews Dataset for this.

Machine Learning (ML)

Luhn’s Heuristic Method for text summarization

The idea of Luhn’s Heuristic Method for text summarization is that any sentence with maximum occurrences of the highest frequency words(Stopwords) and least occurrences are not important to the meaning of the document