Large Language Models (LLM)

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

As is evident with the rise of tools like ChatGPT, Large Language Models have been one of the most significant and disruptive innovations of the 21st Century in the field of technology, with the potential to revolutionize a wide range of domains, from natural language processing and machine translation to content creation and even distant- seemingly non related domains such as literature and finance. This article at OpenGenus will explore the history of large language models (LLM), their underlying concepts, use cases, and real life implementations.

Table of contents:

History of LLMs
Concepts behind LLMs
Some Use Cases of LLMs
Implementations of LLMs
List of Large Language Models (LLMs)
MCQs on Large Language Models (Try it)

History of LLMs

Large language models originated in the 1950s and 1960s, when researchers began experimenting with machine translation systems. However it wasn't until the 1980s that language models gained prominence, with researchers discovering approaches for creating n-gram models that could estimate the likelihood of a word based on its context.

Neural network-based language models originated in the 1990s, one on their earliest applcations being the abillity to learn the statistical features of a language by analyzing a massive corpora of text. Unfortunately, the computing resources available at the time constrained any further development in this field, and hence they did not receive widespread use for around 2 decades.

In 2013, a team of academics at the University of Toronto released a study on a new form of neural network known as a deep belief network. This network, which was trained on a vast dataset of unlabeled text, outperformed the competition on a variety of language modelling tasks.

This accomplishment cleared the path for the construction of huge language models, with businesses such as Google, OpenAI, and Facebook spending extensively in research and development in this field. For example, GPT-3 - the Large Language Model being used in products like ChatGPT, has 175 billion parameters, while further iterations of GPT are expected to have an even larger number of parameters. mnbvxz

Concepts behind LLMs

LLMs generally work by predicting the next word that is to follow given the preceding words of a sentence and the context being offered by them. This is done by utilizing a neural network which is trained on a large corpus of text.

Recurrent Neural Networks

The most common type of neural networks being utilized for such tasks are [recurrent neural networks](https://iq.opengenus.org/recurrent-neural-networks-with-emoji-sentence-example/) , the reason being that they are able to maintain a memory of the previous words in the sentence and use that information to predict the next word.

However, use of RNNs comes with it's own limitations. For example, this method utilizes partial derivatives of the loss function to make certain decisions. However, if there are too many layers to the RNN, at one point the derivative may become zero, thereby forgoing the ability to propagate useful gradient information from the output end of the model back to the layers near the input end of the model. This is also known as the vanishing gradient problem.

To overcome these limitations, certain other approaches have been identified as well, as enunciated below.

Long Short Term Memory (LSTM)

LSTMs (Long Short-Term Memory) is type of neural network that is commonly used for processing sequential data, such as text or speech. It operates by using a combination of memory cells and gates to selectively retain or forget information as it flows through the network. This allows the network to learn and remember long-term dependencies in the data, which is important for language processing tasks.

LSTMs, like RNNs, may be used to estimate the likelihood of the next word in a phrase based on the preceding words. It may, however, deal with the vanishing gradient problem by introducing a new gate called the forget gate, which assists the network in regulating the gradient values at each time step.
The forget gate contains the activation vector, which determines how much information is preserved or deleted at each time step. Based on the current input and the prior concealed state, this gate selects which information to forget and which to remember. The forget gate allows the network to maintain a balance between preserving important information and discarding irrelevant information.

To train a LLM, LSTMs can be trained on a large corpus of text. LSTMs learn to estimate the likelihood of the next word in the sequence given the context provided by the preceding words during training. Throughout the training phase, the LSTMs collect patterns and relationships in the text, which may later be utilized to produce new content.

After training, the LSTMs may be used to produce new text by giving a beginning sequence of words. The LSTMs then forecast the probability distribution over the next word in the sequence, and a word from this distribution is sampled to construct the next word in the series. This method is applied repeatedly until a whole sentence or paragraph is generated.

Gated Recursive Units (GRU)

Another approach towards LLMs can be through GRUs or Gated Recursive Units.

Gated Recurrent Units can be considered a subset of recurrent neural networks.
GRUs can be used as an alternative to LSTMs for training LLMs (Large Language Models) owing to their abillity of handling sequential data by processing it one element at a time, such as a sequence of words in a sentence. However, GRUs and LSTMs differ in the way they handle and store information.

Compared to LSTMs, GRUs have fewer parameters and a simpler architecture, , making them easier to train and less prone to overfitting. However, this also makes them less effective at modeling long term dependencies as compared to LSTMs.

For the purpose of training an LLM, GRUs can be used in a similar way as LSTMs. They can be trained on a large corpus of text to predict the probability of the next word in a sentence given the previous words. This training process allows the GRUs to learn patterns and relationships in the text, which can be used to generate new text.

Some Use Cases of LLMs

Natural Language Processing- NLP is the first and foremost use case which comes into consideration when LLMs are concerned. LLMs can be used to improve a wide range of NLP tasks, such as language translation, question-answering, summarization and sentiment analysis.
Content Creation- There is an ever-increasing demand for high-quality content, with the rise in use and innovations in social media. LLMs can be used to provide pointers for articles, product descriptions and social media posts among others, saving businesses time and money.
Chatbots- LLMs can be used to create software products which interact with users and respond to their statements. The GPT-3 model in particular is quite suited for this task.
Language Translation- Software prosucts like Google Translate leverage LLMs to translate text from one language to another.
Logical Reasoning- LLMs can be used to answer questions which may require complex reasoning skills, based on certain facts, premises and assumptions.

Implementations of LLMs

One of the foremost and most well known implementations of LLMs are GPT-2 and GPT-3, where GPT stands for **Generative Pre-trained Transformer**. GPT-3 in particular stands out due to it's use in ChatGPT. Although given the scope and size of an LLM, it is difficult for a standalone user to implement a LLM from a scratch, we can always use a pre trained model to generate insights as per our requirements.

We shall import the gpt2 model from the transformers library. Prior to running any code, ensure that the transformers library is installed. If not, install it by using this prompt on your command line.
python -m pip install transformers

If you're using jupyter notebooks, run this statement in a cell.

!pip install transformers

Following is the code for using GPT-2 in python. We first import the prerequisite modules from the transformers library. Then, we load the pre trained model namely- GPT 2 as well as the tokenizer. Post that, we take an input and tokenize it. Post tokenization, er pass the data to the model which generates some output text.

from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the pre-trained model
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Load the tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Tokenize the input
input_text = "Hello, how are you today?"
input_ids = tokenizer.encode(input_text, return_tensors='pt')

# Generate text
output = model.generate(input_ids, max_length=50, do_sample=True)
output_text = tokenizer.decode(output[0], skip_special_tokens=True)

print(output_text)

A second method involves making an API call to a model hosted on a certain site, like hugging face and use the model to meet the needed requirements.

Before we move to the actual code, let's understand what an API actually is and how we can utilize one with hugging face.

An API (Application Programming Interface) is a set of protocols, tools, and specifications that allow software applications to interact with one another. APIs allow developers to get access to the functionality of other software programmes, platforms, or services and incorporate it into their own applications.

Hugging Face's API gives access to their pre-trained models and other NLP tools. Developers may use their API to incorporate these models and tools into their own apps and services, eliminating the requirement to train their own models from start.

To utilise the Hugging Face API, you will need to firstly create an account on their website and then acquire an API key. After you have an API key, you may use HTTP requests to make queries to their API endpoints. Following is the code.

import requests
import json

# Set up the API endpoint and headers
url = "https://api-inference.huggingface.co/models/gpt2"
api_key = "YOUR_API_KEY_HERE"
headers = {"Authorization": f"Bearer {api_key}"}

# Set up the prompt and parameters for text generation
prompt = "The quick brown fox"
params = {
    "max_length": 50,
    "temperature": 0.8,
    "do_sample": True,
}

# Set up the request data
data = {
    "inputs": prompt,
    "parameters": params,
}

# Send the request to the API
response = requests.post(url, headers=headers, data=json.dumps(data))

# Parse the response and print the generated text
output = json.loads(response.content.decode("utf-8"))
generated_text = output["generated_text"]
print(generated_text)

One possible advantage of using API calls is that a website like hugging face hosts multiple models for multiple tasks. Keeping this in mind, with a few changes here and there the same code can be reused for multiple tasks.

List of Large Language Models (LLMs)

Below is a table of certain LLMs and their details.

</tr>

</tr>

Model	Company Behind It	Release Year	Size	Primary Use Case
GPT-1	OpenAI	2018	117M parameters	Text generation, language modeling
BERT	Google	2018	340M parameters	Bidirectional Encoder Representations from Transformers (BERT), which is pre-trained using both masked language modeling and next sentence prediction
GPT-2	OpenAI	2019	1.5B parameters	Language modeling, text generation, summarization, translation, chatbots
XLNet	Carnegie Mellon University and Google	2019	340M parameters	Generalized autoregressive pretraining with permutation language modeling, which allows for more flexible and nuanced modeling of dependencies between words
RoBERTa	Facebook AI	2019	355M parameters	Similar to BERT, but pre-trained on a larger corpus of data with longer sequences and dynamically changing masking patterns
CTRL	Salesforce	2019	1.6B parameters	Text completion, language modeling, dialogue modeling, and question answering.
T5	Google	2020	11B parameters	Text-to-text tasks such as summarization, translation, question answering, etc.
Turing-NLG	Microsoft	2020	17B parameters	Natural language generation tasks such as language translation, conversation modeling, and text completion.
Reformer	Google	2020	1.2B parameters	Efficient language modeling and text generation.
ELECTRA-Large	Google, Stanford University	2020	335M parameters	Text classfication, text summarization, dialogue generation.
GShard	Unspecified	2020	600B parameters	Language modeling, text generation, summarization, translation, chatbots
Megatron	Nvidia and Microsoft	2020	8.3B Parameters	Machine Translation, Question Answering, Text Classification, and Language Generation.
GPT-3	OpenAI	2020	175B parameters	Transformer architecture with attention mechanisms, autoregressive language modeling, and fine-tuning on a wide range of tasks
GPT-3.5	OpenAI	2022	175B parameters	Transformer architecture with attention mechanisms, autoregressive language modeling,language conversion etc
GPT-4	OpenAI	2023	>1T parameters	Text, video, image analysis, human interaction

MCQs on Large Language Models (Try it)

Before wrapping up, Let's have a look at a few questions.

Q1)Based on your understanding, which can be a possible limitation of a product based on an LLM like ChatGPT?

Accuracy/correctness of answer.

Limited Use cases

Biases in answer.

Difficult to use by a layperson

Yes. Depending on the training data provided, LLMs can exhibit certain biases in terms of ethnicity, gender, religion etc which need to be corrected.

Q2)Is the implementation of an LLM by a standalone user from scratch possible?

Easily possible

Impossible

Possible but difficult.

Can't say

Although a standalone user can implement a large language model (LLM) from scratch on a PC, However it is a complex task that requires advanced knowledge of programming, machine learning, natural language processing, as well as a very large corpus of training data and computer hardware.

With this article at OpenGenus, you must have the complete idea of Large Language Models (LLMs).