AI Expert Retro Talk Part III: LLMs and chatGPT

8 min readAug 8, 2023

A deep dive into how chatGPT works, and how LLMs work under the hood

What do you mean by LLM?

In the realm of artificial intelligence, the term “Large Language Models” often denotes a specific category of deep learning constructs known as Transformers. These sophisticated models are adept at handling sequential data, whether it’s text, images, or time series, and they fall under the broader umbrella of Sequence Models. Many of these Sequence Models can be classified as Language Models, which are designed to learn the probability distribution of subsequent elements in a sequence, such as the next word, pixel, or value.

So, what are Sequence Models?

I recently wrote another article covering Sequence Models here.

Continuing on to LLMs, the Transformer architecture stands apart from its antecedents by virtue of its capacity to discern the contextual relationships among values within a sequence, a feat achieved through a mechanism known as (self-) Attention. This contrasts sharply with the Recurrent Neural Network (RNN), which preserves the temporal order by processing each time step sequentially within a sequence.

Transformers, on the other hand, are capable of simultaneously reading the entire sequence and selectively focusing on values that precede in time through a process referred to as “masking.” This parallel processing not only accelerates training times but also facilitates the handling of larger model parameter sizes.

In the early days of Transformers, a model with approximately 100 million parameters was considered “large.” However, the landscape has evolved dramatically, with contemporary models boasting an astonishing 500 billion to 1 trillion parameters. Intriguingly, several scholarly articles have identified a significant turning point in Transformer behavior at around the 100 billion parameter mark.

It’s worth noting that these colossal models generally exceed the capacity of a single GPU, necessitating the segmentation and distribution of the model across multiple computational nodes. This complexity underscores the transformative power and scale of modern Transformer architectures.

Transformers, in the context of deep learning, can be systematically classified into three distinct categories: the “encoder only” design (exemplified by BERT), the “decoder only” structure (as seen in GPT), and the composite “encoder-decoder” architecture (such as T5).

Encoder Only (e.g., BERT): This category is typically harnessed for tasks that necessitate a comprehensive understanding of the entire sequence. Sentiment classification is a prime example where the encoder’s ability to grasp the full context of a sequence is paramount.
Decoder Only (e.g., GPT): Contrarily, decoder-only models are particularly suited for tasks that involve text completion, such as finishing a sentence. The decoder’s specialization lies in generating coherent continuations of a given text fragment.
Encoder-Decoder Architecture (e.g., T5): This hybrid design is versatile and can be tailored to address a wide array of problems. Its most renowned application, however, is in the field of language translation, where both encoding and decoding processes are integral to translating text from one language to another.

And, what, how and why is Attention so important?

Got you covered here too.

Back to LLMs, since the advent of the OG Google’s paper on Transformer models and Self-attention, we have learned that LLMs have an amazing capacitity to generalize.

Generalization is the capacity of an AI model to produce an accurate and precise inference when feeding on data it has never seen before, and LLMs are almost logarithmically proportional when talking about their accuracy, number of training examples and number of parameters of the LLM. This is called: few-shot AI learners. However, it is interesting and relevant to know that OpenAI (the company behind chatGPT, powered by GPT-4), has publically stated that their LLM model can only grow in parameter size so far, before its obvious logarithmic growth and improvement starts halting.

Research Trends in Large Language Models (LLMs) 📈

The landscape of research in Large Language Models (LLMs) has been marked by several prominent trends and innovations. Here’s an exploration of some key developments:

Scaling Up LLMs: A significant trend has been the progressive training of increasingly larger LLMs and evaluating their performance on various benchmarks. This trend has sparked debates, such as those presented in the CLIP paper, questioning whether benchmark performance truly reflects a model’s generalizability — a nuanced observation known as “The Cheating Hypothesis.”
Model Parallelism and Optimization: The need to train colossal models has necessitated splitting them across multiple GPUs/TPUs. This has been facilitated by techniques from the Megatron paper, innovations in model/pipeline sharding, and tools like DeepSpeed. Additionally, quantization methods have been employed to reduce both memory and computational footprints.
Attention Mechanisms: Since the traditional self-attention mechanism at the heart of the Transformer exhibits O(N2)O(N2) space and time complexity, research into more efficient alternatives, such as Flash Attention, has gained traction. Innovations like Alibi have enabled variable context windows, allowing for larger context windows — up to 100k tokens in today’s LLMs.
Fine-Tuning Innovations: Given the sheer size of modern LLMs, there has been a growing interest in fine-tuning them more efficiently. Techniques in Parameter-Efficient Fine-Tuning (PEFT), including Adapters and LoRA, have accelerated this process by reducing the number of parameters to adjust. Coupled with 4- and 8-bit quantization, it’s now feasible to fine-tune models on CPUs, a departure from the traditional 16 or 32-bit floats.
Other Research Avenues: This overview is not exhaustive, as there has been substantial research into areas such as LLMs’ ability to reproduce information, adversarial attacks, and domain-specific LLMs like Codex (for coding). Early-stage multimodal LLMs, capable of understanding images and text, have also emerged. Moreover, studies like RETRO and webGPT have demonstrated that smaller LLMs can achieve comparable performance to larger models through efficient querying and information retrieval.
Chronological Context: It’s essential to recognize that some of these innovations, such as Flash Attention and LoRA, were developed subsequent to the papers discussed in related sections of the literature.

InstructGPT: A Pivotal Development in LLM Behavior 🤖

The release of the instructGPT paper marked a seminal advancement in our comprehension of Large Language Models’ (LLMs) behavior, particularly in their interaction with natural language instructions.

GPT-3, especially the versions with a substantial number of parameters, had already showcased the ability to respond to natural language instructions, commonly referred to as “prompts.” However, this capability was not without its challenges:

Precision in Prompting: The instructions often had to be meticulously crafted to elicit the desired output. Even slight ambiguities could lead to unexpected or irrelevant responses.
Regurgitation and Hallucinations: The responses generated by the model could either be regurgitated language, often sourced from obscure and unverified parts of the Internet, or entirely fabricated constructs known as “hallucinations.” These regurgitated responses could be unfiltered, potentially offensive, untruthful, and generally misaligned with the user’s intent.
Underwhelming Performance: Despite the theoretical ability to follow natural language instructions, the practical application of this capability in LLMs like GPT-3 often fell short of expectations. The inconsistency and unpredictability in responses led to a perception of these models as unreliable or even capricious in their behavior.

Steerability ⛷️ & Alignment 🧘🏽‍♀️: Challenges and Innovations in LLMs

The training of Large Language Models (LLMs) to predict subsequent tokens in a sentence has proven to be an effective method for instilling a generalizable understanding of language. By training over extensive corpora (such as the entire Internet) and incorporating various prediction tasks (translation, classification, etc. — referred to as a “mixture of objectives”), researchers can create powerful LLMs capable of few-shot learning. However, this success has not come without challenges:

1. Steerability:

User Intent Mismatch: Training objectives may not necessarily translate into LLMs accurately following user intent.
Inconsistent Responses: LLMs may excel at multiple-choice questions or specific tasks but often falter in following user instructions without substantial guidance, especially in zero-shot settings.
Regurgitation and Hallucinations: Responses may be recycled from previous content, irrelevant ramblings, or confidently presented nonsense. This highlights the issue of steerability — the ability to guide an LLM to produce a desired result.

2. Alignment:

Value Alignment: The desire for LLM outputs to reflect specific values (e.g., avoiding racism or homophobia) or meet user quality expectations adds complexity to the alignment problem.
Open Research Threads: Questions surrounding what values an LLM aligns with, how to evaluate alignment, and the feasibility of aligning systems more intelligent than ourselves remain intriguing research areas.

3. InstructGPT’s Solutions:

Supervised (Instruction) Fine-tuning (SFT): A method to refine the model’s response to instructions.
Reinforcement Learning via Human Feedback (RLHF): A sequential training task to further enhance the model’s alignment with human values.

By employing these two sequential training tasks, the authors of the instructGPT paper were able to transform GPT-3 into InstructGPT.

4. Key Insights:

Size Isn’t Everything: The results of the InstructGPT paper revealed that merely increasing the model’s size was not a sufficient condition for achieving steerability and alignment. Interestingly, the 175B parameter GPT-3 “prompted” model performed worse on average than the 1.3B parameter InstructGPT.

Experiments

1. Instruction Fine-Tuning (SFT) 🎛️

The journey to enhance the output of a generative model logically begins with refining its ability to follow instructions.

Methodology:

Prompt Collection: The authors gathered prompts from real users via their playground API (addressing the cold-start problem) and devised additional prompting tasks.
Labeling Process: Employing a meticulous selection process, labelers were engaged to create high-quality outputs aligned with Anthropic’s 3 Laws: helpful, honest, and harmless (HHH or Triple H).
Fine-Tuning: Using these gold-standard labels, the authors fine-tuned a GPT model to learn these outputs, a process termed “Supervised Fine-tuning” (SFT).

2. Reinforcement Learning via Human Feedback (RLHF) 💎

The challenge of imbuing an LLM with a set of values requires a more nuanced approach.

Challenges:

Complexity of Human Values: Traditional loss functions, such as cross-entropy, measure a model’s belief about data-label probability. Human values, however, are multifaceted and resist encapsulation within a single label.
Preference Encoding: It may be more practical for humans to encode preferences by comparing multiple LLM outputs rather than explicitly labeling data.

RLHF Approach:

Learning Preferences: RLHF aims to teach an AI system preferences by learning a reward function through human interaction and feedback.
Reward Modeling: This reward model guides the LLM to generate higher-valued outputs via reinforcement learning. It assesses input and output to return a “preferability” score, with the LLM striving to minimize the difference between optimal preference and current output.
Benefits: Theoretically, RLHF can reduce an LLM’s tendency towards regurgitation by guiding outputs towards optimal answers through incentives, rather than overfitting to specific label distributions.
Results with InstructGPT:
Improved Outputs: InstructGPT demonstrated less toxic, more truthful, and more steerable outputs.
Catalyst for Innovation: This paper has spurred a wave of new innovations using LLMs.

Note on Superalignment: It’s worth mentioning that RLHF is not expected to work in the case of superalignment, where the challenge is to teach systems more intelligent than ourselves our values.

Conclusion

The combination of Supervised Fine-Tuning (SFT) and Reinforcement Learning via Human Feedback (RLHF) represents a sophisticated approach to improving LLMs’ ability to follow instructions and align with human values. By integrating rigorous labeling, fine-tuning, and reinforcement learning, the authors of the instructGPT paper have contributed valuable insights and methodologies to the field. These techniques not only enhance the quality and steerability of LLM outputs but also open new horizons for research and innovation in the ever-evolving landscape of artificial intelligence. For those interested in a more detailed exploration of RLHF, HuggingFace’s blog post serves as an informative primer.