×

Search anything:

Chunking and Chinking in NLP

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Natural Language Processing (NLP) involves the use of computational algorithms to process and analyze natural language. One important aspect of NLP is chunking, which involves the extraction of meaningful phrases or chunks from text data. Chinking is a related technique that involves the exclusion of certain words or phrases from a chunk. In this article, we will explore the concepts of chunking and chinking in NLP in more detail.

Table of Content

  • What is chunking
  • what is Chinking
  • How does Chunking Work?
  • How does Chuiking Work?
  • why Chunking and Chinking is important in nlp
  • Implementation
  • Use-Case

What is Chunking?

Chunking involves grouping together individual pieces of information from a sentence, such as nouns, verbs, adjectives, and adverbs, into larger units known as chunks. The most common type of chunking is noun phrase (NP) chunking, which involves identifying and extracting noun phrases from a sentence, such as "the cat," "a book," or "my friend." Another type of chunking is verb phrase (VP) chunking, which involves identifying and extracting verb phrases from a sentence, such as "ate breakfast," "is running," or "will sing."

What is Chinking?

Chinking, on the other hand, is the process of excluding certain words or phrases from a chunk. This is useful when we want to exclude specific types of words, such as prepositions, conjunctions, or determiners, from the chunks we extract. Chinking is typically used in combination with chunking, and it involves identifying the words or phrases that we want to exclude from a chunk and marking them with the tag O (outside) in a named entity recognition (NER) system.

How does Chunking Work?

Chunking involves first identifying the parts of speech in a sentence using part-of-speech (POS) tagging. Once the parts of speech have been identified, the chunks are created by grouping together certain types of words based on their POS tags.
The most common types of chunks include noun phrases, verb phrases, and prepositional phrases. These chunks can be identified using regular expressions or by using a pre-trained chunking model.

Here is an example of chunking in action:

Sentence: The cat sat on the mat.

POS tags: The/DT cat/NN sat/VBD on/IN the/DT mat/NN.

Chunks:

  • Noun Phrase: The cat
  • Verb Phrase: sat
  • Prepositional Phrase: on the mat

How does Chinking Work?

Chinking is performed using the same techniques as chunking, with the addition of a chinking rule that specifies which words or phrases to exclude from the chunk. Chinking rules are typically specified using regular expressions.

Here is an example of chinking in action:

Sentence: The cat sat on the mat.

Chunk: The cat sat on the mat

Chink: Remove the prepositional phrase "on the mat"

Final Chunk: The cat sat

Why Chunking and Chinking is important in nlp

Chunking and chinking are important techniques in NLP because they allow us to extract meaningful information from text data. By identifying and grouping together chunks of information, we can analyze patterns and relationships within the text and extract relevant insights.

For example, in a large corpus of news articles, we could use chunking to extract noun phrases from the headlines and analyze the frequency of different types of nouns to identify the most common topics. Similarly, in a customer feedback dataset, we could use chunking and chinking to extract relevant phrases and sentiments related to a particular product or service.

Implementation

Chunking and chinking can be implemented in NLP using various programming languages and tools, such as Python and NLTK (Natural Language Toolkit). Here's an example implementation using Python and NLTK:

import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag, ne_chunk
from nltk.chunk import RegexpParser
Next, we'll define the text that we want to perform chunking and chinking on:

text = "John Smith is the CEO of ABC Corp. based in New York City. He is a great leader and a successful businessman."

'''We'll then tokenize the text into words and apply part-of-speech (POS) tagging to identify the grammatical roles of each word'''

tokens = word_tokenize(text)
tagged = pos_tag(tokens)

'''Now we're ready to perform chunking. We'll use a regular expression pattern to identify noun phrases in the text'''

chunk_grammar = r"""
  NP: {<DT>?<JJ>*<NN.*>+}   # chunk noun phrases
"""
cp = RegexpParser(chunk_grammar)
chunked = cp.parse(tagged)
    
'''The resulting chunked object contains a tree structure with the identified noun phrases grouped together as chunks.

To perform chinking, we'll modify the regular expression pattern to exclude certain types of words from the chunks. For example, we might want to exclude prepositions and conjunctions from noun phrases. Here's an example chinking pattern'''

chink_grammar = r"""
  NP:
    {<.*>+}                # match any word
    }<IN|CC>+{             # chink prepositions and conjunctions
"""
cp = RegexpParser(chink_grammar)
chinked = cp.parse(chunked)\
    
'''The resulting chinked object contains the same tree structure as the chunked object, but with the excluded words marked as "outside" (O) tags.

Finally, we can visualize the resulting chunks and chinks using NLTK's tree drawing capabilities'''

chunked.draw()
chinked.draw()

This will open a new window with a visual representation of the tree structure, showing the identified chunks and chinks.

# Chunking output:

(S
  (NP (NNP John) (NNP Smith))
  is
  (NP
    (DT the)
    (NNP CEO)
    (IN of)
    (NNP ABC)
    (NNP Corp.))
  based
  (PP (IN in) (NP (NNP New) (NNP York) (NNP City)))
  .
  He
  is
  (NP (DT a) (JJ great) (NN leader))
  and
  (NP (DT a) (JJ successful) (NN businessman))
  .
)
    
    
# Chinking output:

(S
  (NP (NNP John) (NNP Smith))
  is
  (NP (DT the) (NNP CEO))
  (PP (IN of) (NP (NNP ABC) (NNP Corp.)))
  based
  (PP (IN in) (NP (NNP New) (NNP York) (NNP City)))
  .
  He
  is
  (NP (JJ great) (NN leader))
  and
  (NP (JJ successful) (NN businessman))
  .
)

As you can see, the chunking output identifies and groups together noun phrases such as "John Smith," "the CEO of ABC Corp.," and "New York City." The chinking output excludes prepositions and conjunctions from the identified noun phrases, resulting in more precise chunks such as "the CEO" and "successful businessman."

Use-Case

Chunking and chinking are widely used in Natural Language Processing (NLP) for various use cases such as information extraction, named entity recognition, and text classification.

  • Named Entity Recognition (NER): Chunking is commonly used in NER systems to identify and extract entities such as people, organizations, and locations from text. For example, using chunking, we can identify that "New York City" is a location in the sentence "John lives in New York City."

  • Information Extraction: Chunking can be used for information extraction tasks, where we want to extract specific information from a text. For example, we can use chunking to extract product names and their features from customer reviews.

  • Text Classification: Chunking and chinking can be used in text classification tasks to identify and group together specific parts of text. For example, in sentiment analysis, we can use chunking to identify and group together adjectives that describe the sentiment of the text.

  • Grammar Checking: Chunking and chinking can also be used in grammar checking tasks to identify and correct sentence structures. For example, we can use chinking to identify and exclude prepositions or conjunctions that are part of a previously identified chunk, resulting in more accurate sentence structures.

Overall, chunking and chinking provide powerful tools for extracting relevant information from unstructured text data and can be applied to a wide range of NLP tasks.

Nithish Singh

Nithish Singh

Nithish Singh is a Machine Learning Developer Intern @OpenGenus. He is an Aspiring Data Scientist and a passionate writer and enjoy working with data using various technologies.

Read More

Improved & Reviewed by:


OpenGenus Tech Review Team OpenGenus Tech Review Team
Chunking and Chinking in NLP
Share this