
The Power of Emotional Attachment in Effective Learning
December 23, 2024📖 Introduction
One of the main reasons artificial intelligence has been widely discussed recently is Large Language Models (LLMs). These are classified as generative models, something we are all familiar with through their usage.
Take ChatGPT, for example. Today, OpenAI has numerous models like GPT-4o, o1, and o3-mini-high. LLMs have advanced significantly in just the past couple of years, but if we were to dive into all the details, this blog would never end.
As this blog is being written, AI models capable of reasoning, conducting research, and providing more human-like responses have already emerged—along with AI agents. However, we will keep our focus primarily on LLMs.
We are all familiar with models like ChatGPT and Gemini. You provide them with a prompt, and they generate responses based on your input. Today, we have moved beyond simple text-based prompts into the multimodal LLM era, where we can feed images, PDFs, and other formats to enhance our prompts. Technology has even evolved to the point where we can converse with these models via voice, making them an unavoidable part of our daily lives.
All of these models fall under the LLM category. Now, we will take a high-level and technical overview of how LLMs work. This blog will not delve into the mathematics behind them but will provide a conceptual technical explanation.
Note: Towards the end of this blog, we have prepared a detailed document for those who want to explore the technical side in depth. Don't forget to download it if you want learn in more depth.
🤖 What Exactly are LLMs?
Large Language Models (LLMs) are AI models trained on vast amounts of text to understand, interpret, and generate human-like language. They mimic how humans learn language through repetition and probabilistic memorization. These models read and analyze text, grasp word and sentence meanings, and generate similar content. LLMs operate in three key stages:
-
Tokenization: Tokenization is the process of breaking text into smaller parts. These smaller parts are called "tokens".
- An easy example: "Hello, how are you?" → ["Hello", ",", "how", "are", "you", "?"]
- Why tokenization matters: handling different languages and vocabularies.
-
Embedding: Each token is converted into numeric vectors by the model. Thus, the meaning relationship between words is expressed mathematically.
- "Cat" word:
[0.3, 0.7, 0.5, 0.1] - "Dog" word:
[0.31, 0.69, 0.48, 0.09] - "Car" word:
[0.8, 0.2, 0.1, 0.7]
Here:
- "Cat" and "Dog" have similar meanings, so their vector values are also similar.
- The vector values are different because the “car” is different from the animals.
Embedding is a more mathematical concept, requiring an understanding of space, dimensionality, and how vectors relate to these concepts. However, for those without a mathematical background, it is enough to interpret the closeness of numbers in the same index as similarity and to know that the transformation into numerical vectors is done because mathematics is the true language of computers.
- "Cat" word:
-
Transformer Architecture and Attention Mechanism: The Transformer architecture, first introduced in 2017, is the foundation of large language models (LLMs). This architecture enables simultaneous attention to all parts of a sentence, allowing the model to understand relationships between words more effectively. As a result, it enables much faster and more efficient parallel processing compared to previous sequential models like RNNs and LSTMs. With these advantages, the Transformer architecture has revolutionized the field of artificial intelligence, driving innovations not only in language processing and data analysis but also across various industries.
🛠 How LLMs Actually Learn
Self-Supervised Learning and Pre-training
The training of LLMs is usually self-supervised. The reason for “self-supervised” here is the following:
- In normal supervised learning, the model is given a data (input) and the correct answer (label) is explicitly provided.
- In Self-supervised learning, some of the words in the data are masked and the model generates a label on its own. Thus, the model actually becomes self-supervised (supervising).
1. Masked Language Modeling (MLM)
In the MLM method, the model masks some random words in the given sentence (covered by the [MASK] token). By learning to correctly predict this masked word, the model becomes a probabilistic model of the entire language.
Example:
- Original Sentence:
"The weather is rainy, so I should probably get an umbrella before going out." - Masked Sentence:
"The weather is rainy, so I should probably get an [MASK] before going out." - Model’s Task:
[MASK] → "umbrella"
By repeating this process millions of times, the model learns the probabilistic relationships between words. When choosing the word that should replace the [MASK] token, it chooses the word in the vocabulary with the highest probability of occurring here and uses its prediction accordingly.
2. Causal Language Modeling (CLM)
In the CLM method, the model does not mask any word, but instead aims to predict the next word after each word. This method also creates a probabilistic model of the language, but it is a directional approach.
Örnek:
- Given sentence:
"The Capital of Turkey is" - Model’s Task: Guess the next word →
"Ankara"
The model repeats this process over and over again, thus probabilistically modeling the semantic structure of the language and the arrangement of words. Both methods help the model to learn semantic and grammatical relations in the language, but have different technical advantages depending on their intended use.
⌨️ Prompting: Getting What You Want from LLMs
The performance of LLMs depends on how people communicate with the model (prompting). Prompting ensures that the questions or tasks given to the model are clearly articulated.
There are three basic prompting methods:
1. Zero-shot Prompting
The model is directly asked a question or given a task without giving any examples.
Example:
- Prompt:
"Translate 'Bonjour' to English." - Model:
"Hello"
Here, even though the model has never seen an example before, it directly generates the result using what it has learned so far from the pre-training.
2. One-shot Prompting
The model is provided with only one example to perform the task. This example makes it easier for the model to understand the task.
Let's consider an example prompt:
English: "Goodbye" French: "Au revoir" English: "Thank you" French: ?
Model: "Merci"
With a single example, the model quickly grasps what it needs to do and makes an accurate prediction.
3. Few-shot Prompting
A few examples are given to the model, allowing it to build a stronger context for the task. This method generally produces more accurate and reliable results. However, it requires more effort from the person providing the prompt. Since this is essentially a scaled-up version of the previous prompt, we can skip this part.
🚀 Want to Dive Deeper into LLMs?
In this blog, we’ve briefly explored LLMs. However, we’ve created a more detailed explanation of everything discussed here for those who want a deep dive into LLMs. If you’d like to download it, enter your email below, and we’ll send you the file.
If you’ve benefited from this blog, I advise you to check out our other blog posts for your educational journey in AI. Also, we highly advise you to follow our social channels here. Stay updated and educated, see you next time!
We’re excited to see you grow, learn, and work towards your dreams. Thank you for being part of our community. We look forward to shape the future together, one sip at a time. Take care, and keep moving forward.


