Extractive vs Abstractive Summarization

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article at OpenGenus, we have explored the differences between Extractive and Abstractive Summarization in depth and presented the differences in a table.

Introduction

One of the most frequent Natural Language Processing (NLP) jobs is summarization.
We are constantly being overrun with fresh data thanks to the billions of individuals using smartphones to create new content every day. Humans have a limited capacity for information consumption, therefore they require a mechanism to separate the important data from the irrelevant noise. For text-based information, text summarization can assist with achieving that. By separating the signal from the noise, we may use the information to guide meaningful actions.

We examine several ways to carry out this work and some of the lessons we have discovered along the road. We hope that this will be useful to other people who want to integrate fundamental summarization into their data science pipeline to address various business issues. Python has several great packages and modules that make text summarization easy to do. We will give a straightforward example of extractive summarization generation using the Gensim and HuggingFace modules.

Uses of Summarization?

It could be tempting to summarise all materials in order to gain the most information possible and cut down on reading time. NLP summarization has, however, thus far only found success in a small number of applications.If a text contains a lot of uncooked facts, text summarization excels at extracting the crucial information from them. The NLP algorithms may condense lengthy papers into shorter, simpler language. These categories include news, factsheets, and mailers.

Text summary is less effective for texts when each phrase builds on the one before it. Medical textbooks and research journals are two excellent instances of materials where summary may not work well. Finally, summarization techniques can be effective when used to summarise works of fiction. However, it might not capture the text's style and tone, which the author tried to convey.Because of this, text summary is only useful in a few situations.

Two Types Of Summarization

types-of-summarization

1. Extractive
Selecting the key words or sentences from the source text and putting them into a summary is known as extractive summarising. This strategy is based on selecting the most important information and providing it in a clear manner without adding any additional data or changing the language.
For instance, if the original document talks about the advantages of a specific product, an extracted summary can highlight the main advantages expressed in the text, including "improved efficiency," "reduced costs," and "increased productivity."

2. Abstractive
On the other hand, abstractive summarization entails creating a summary by rephrasing and combining the information in the original source. This method calls for the system to comprehend the text's content and apply natural language generation techniques to provide a new summary that might not exactly match the original document's words or phrases.
For instance, an abstractive summary of the same document's advantages for a product would say, "The product provides businesses with a significant advantage by improving efficiency, reducing costs, and increasing productivity."

Difference between Extractive vs Abstractive Summarization


Aspect	Extractive Summarization	Abstractive Summarization
Definition	Selects and compiles important sentences or phrases from the original text	Generates a summary using natural language generation techniques
Content	Reproduces content from the original text	May include original content not present in the original text
Output	Sentence or phrase-based	More fluent, human-like language
Input	Limited to the content of the original text	Can incorporate external knowledge or information
Difficulty	Easier to implement	More challenging due to the need for natural language generation
Accuracy	May miss important information	May include irrelevant or incorrect information
Use case	Suitable for summarizing news articles, scientific papers, and other informative texts	Useful for creating headlines, marketing copy, and other creative content

Algorithms available for Extractive vs Abstractive Summarization

Extractive Summarization:

1. TextRank: An unsupervised graph-based algorithm that ranks sentences based on their importance using PageRank algorithm.
2. LexRank: Another unsupervised graph-based algorithm that measures the centrality of each sentence using cosine similarity.
3. LSA: Latent Semantic Analysis (LSA) is a statistical algorithm that identifies important sentences by analyzing the underlying latent semantic structure of the text.

Abstractive Summarization:

1. Sequence-to-Sequence models: A neural network-based approach that uses encoder-decoder architecture to generate a summary.
2. Pointer-Generator Networks: A modified version of sequence-to-sequence models that can copy words from the source text.
3. Transformer-based models: A type of neural network that uses self-attention to capture contextual relationships between words in the text. These models have shown promising results for abstractive summarization tasks, such as BERT, GPT-2, T5, etc.

Both extractive and abstractive summarization have advantages and disadvantages, and the best strategy will be determined by the task's unique needs. When the source content is technical or domain-specific, extractive summarization is frequently preferable because it may properly capture important information without adding mistakes. When a more comprehensive comprehension of the information is needed or when a summary has to seem more fluid and natural, abstractive summarization is frequently favoured.

We saw some quick examples of Extractive summarization, one using Gensim’s TextRank algorithm, and another using Huggingface’s pre-trained transformer model. In further posts, we will go over LSTM, BERT, and Google’s T5 transformer models in-depth and look at how they work to do tasks such as abstractive summarization.

Extractive vs Abstractive Summarization

Machine Learning (ML)

Introduction

SQLite - Viewing Data

Dot Product in Deep Learning