Intelligence Lens
Posts
The magic behind LLMs and how they actually work

The magic behind LLMs and how they actually work

It's everything from AI secrets to LLM thinking and how AI works. You wouldn't believe it! 😳

Emmanuel Ajayi
January 21, 2025

In partnership with

An entirely new way to present ideas

Gamma’s AI creates beautiful presentations, websites, and more. No design or coding skills required. Try it free today.

Hi, AI Enthusiasts! 👋

The term ‘LLM(s)’ stands for ‘Large Language Model(s)’. If you’ve been around the AI space for a minute, you must have come across people using it a couple of times. On one part, it means exactly what you think it means. It’s a model (or if you like, a type of AI) that has been trained to replicate language(s) by feeding it with a lot of data.

This explanation of LLMs sound just about right, in layman’s terms, but it goes deeper than that and today, i’ve packed loads to take through how LLMs came to be, all the creative ways that it’s currently being used and the latest innovations with it 😉

So, buckle up as we dive into the fascinating world of LLMs, from their inception to cutting-edge applications that might just change how you see AI. Welcome!!

📸 credit: istockphoto

LLMs are neural networks trained on vast datasets of text to understand, create, and manipulate human language. They have become central to advancements in AI, specifically within the field of natural language processing (NLP).

LLMs have a fascinating history, starting from basic ideas and growing into powerful tools that can now understand and generate human-like text.

In the early days, around the 1950s to the 1980s, computers weren’t very good with language. Researchers and developers tried to make them follow strict rules to hold conversations, like in the famous ELIZA program from the 1960s. But these rule-based systems were really limited and so could only respond in ways they were specifically programmed to. In the 1980s and 1990s, researchers shifted from rule-based systems to using statistics, allowing computers to learn language patterns from large text datasets.

By the 2000s, neural networks, mimicking human brain functions, improved language tasks significantly.

A major breakthrough then came in 2017 with the Transformer model, which could analyse entire sentences at once, making language processing more efficient. This led to rapid advancements, with Google’s BERT (Bidirectional Encoder Representations from Transformers) in 2018 and OpenAI’s GPT (Generative Pre-trained Transformer) series, culminating in GPT-3 in 2020, which could generate human-like text. Others include: LaMDA from Google, which focuses on dialogue applications, and PaLM, a more recent model showcasing significant performance in various tasks. There’s also LLaMA by Meta, providing an open-source alternative for researchers.

Recent innovations like GPT-4 have further enhanced these models, enabling them to handle longer texts, reason through complex problems, and can even work with both text and visual data simultaneously, making LLMs more powerful and versatile.

❝

I believe that LLMs build sufficiently complex models of the world that I feel comfortable saying that, to some extent, they do understand the world.

- Andrew Ng, 2023

How Does It Work?

It’s easy! 😉

LLMs are typically pre-trained on a broad range of text to learn language patterns, then fine-tuned on specific datasets for particular tasks like translation, summarisation, or generation. Once set, they use transformer architectures, which rely on an attention mechanism to weigh the importance of different words in a sequence.

In simpler terms, LLMs are first pre-trained with a large amount of data, they then use their attention mechanism (like a highlighter) to focus on different parts of a sentence to understand the meaning of each word in context. It can simultaneously consider every word in a sentence, determining which words are most relevant to each other. This ability to focus on multiple words at once, rather than processing them one by one, is a big part of what makes transformers so powerful! 💯

Here’s How Some of Your Favourite LLMs work

The pre-training of DeepSeek allows it to handle complex queries or delve into niche topics, using the attention mechanism not just to understand language but to connect concepts across different fields of knowledge.

For Grok, its pre-training allows it to understand human perspectives and quirks, aiming to provide not just accurate but also insightful or humorous responses. Its fine-tuning process sharpens this capability, making it adept at offering an outside perspective on human queries.

Perplexity AI uses its foundational training to excel in answering questions with precision. Its attention mechanism is finely tuned to discern the most relevant information from a query, ensuring responses are not only accurate but also directly address the user's intent. And finally, ChatGPT.

With its iterations over time, ChatGPT has been trained on an ever-growing amount of text from various domains. This pre-training allows it to handle a wide array of conversational tasks, from casual chats to detailed explanations. The fine-tuning also ensures that it can adapt its responses to be contextually appropriate, whether for translation, summarisation, or engaging in nuanced conversation.

Unlock the full potential of your workday with cutting-edge AI strategies and actionable insights, empowering you to achieve unparalleled excellence in the future of work. Download the free guide today!

What’s New With LLMs?

In the last six months, the landscape of LLMs has witnessed several groundbreaking innovations; topping the list is OpenAI! 😉

OpenAI has expanded its capabilities with the launch of GPT-4o. This model excels in processing and generating across text, audio, image, and video, with notably low latency for real-time applications like live captioning. Google's contribution to the field includes Gemma, offered in both 2B and 7B parameter versions, promoting an open-source environment that has spurred community development. This model shares technologies with Google's Gemini, enhancing applications in multilingual and conversational contexts. Meta AI's Llama 3 has also made its mark, continuing to push for open-source solutions that challenge proprietary models by encouraging a collaborative development atmosphere.

On the efficiency front, sparse models like Google's GLaM have emerged, allowing for larger, yet less energy-intensive models. Techniques like MapReduce for LLMs have enabled processing of extensive text, pushing the limits of what these models can handle. OpenAI's approach to model customization through Reinforcement Fine-Tuning has allowed for tailoring LLMs to specific, complex tasks, providing specialized AI solutions for industries like coding or finance.

Google DeepMind is leading efforts to reduce errors and mitigate hallucinations in multimodal LLMs by enhancing the vision-language performance of their models without compromising accuracy. Speed enhancements have also been significant, with techniques like StreamingLLM promising up to 22 times faster inference, opening up new possibilities for real-time applications.

Lastly, novel reasoning techniques such as Chain of Continuous Thought (COCONUT) are exploring how LLMs can engage in continuous rather than discrete logic, potentially revolutionising AI's approach to problem-solving.

My thoughts?

This is all really fascinating to me and i can’t wait to see what we make out of these new discoveries.

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

On Thursday….

I’ll give you a deeper look into DeepSeek and all the latest on this amazing AI company!