Ad Astra
Posts
NLP and Multi-Dimensional Arrays: The Behind the Scenes of ChatGPT

NLP and Multi-Dimensional Arrays: The Behind the Scenes of ChatGPT

For a person who’s imagination skills are comparable to that of a potato, multidimensional arrays, NLP and the working of ChatGPT are quite mind boggling.

Gelli Saee Harshita
January 20, 2025

Introduction: Let’s Talk Dimensions, Baby!

I want you to imagine a 1-dimensional Array.

Just a straight line- Smooth like butter!

Now, can you imagine a 2-D Array?

It should look like your excel sheet- a bunch of rows and columns.

Still easy, right?

Now, 3-D?

Think of a cube bro.

Arghh, what are we in… 6th grade math class?

Okay okay… Gear up!

Now, try 4-D?

Now you’re like:

“Umm… what?”

And then there’s 12,288-D?

Well, even those over-achiever-teacher’s-pets back in 6th grade math class would say:

“I’m out.”

But… if you’ve still caught up to me…

Hi, Ms. ChatGPT!

Our human brain is capable of imagining up to 3 dimensions comfortably, beyond that it gets tricky.

While I was passionately looking for something to overthink about, multidimensional $h*t consumed my head.

But why?

Cause, for a person whose imagination skills can be compared to that of a potato, multidimensional arrays and the working of ChatGPT are mind boggling enough to fuel some mindful overthinking.

And that’s when I decided to explore NLP in ChatGPT.

Now let me explain what the hell NLP actually is using a hypothetical conversation between Ms.ChatGPT and I.

Let’s understand the struggle she usually goes through when you and I bombard her with 10,500 questions the night before our exam.

Note: I’m bold –ummm… I mean, the bold text is me talking, and the plain text is ChatGPT replying.

Intro to NLP: The Beauty Behind the Brains of ChatGPT

Yo! chatGPT bro, sup? Can I ask you something?

Yo! Nothin much bro, yeah sure ask away.

How do you and your machine/AI buddies pretend to “think” like us and come up with sentences in such a human-like manner?

Let me tell you a secret- “It’s NLP: Natural Language Processing”

I don’t get it, but it sounds cool, go on.

I use NLP to convert data into vectors- this process is called vectorization. I convert text into numbers. This helps me understand text better, plus I love math (It’s mostly cause I can’t understand the text).

Dude, no offence but it’s kinda annoying how you generate one word at a time like a sloth and my eyes hurt looking at the screen.

I’m sorry, my intelligence is well…” artificial”. I cannot come up with sentences on my own.

I generate my answer one word at a time. I have to use multi-dimensional arrays and transformers, probability, and attention mechanisms to predict which would be the most meaningful word to speak next and talk sense. So, I need a little bit of time to do all this bts.

And in simple words:

Think of it this way - Imagine you’ve mugged up each and every word from 10 of your textbooks (if only I could, I’d get a 10 CGPA). These are like the training datasets I use to learn from.

And every time you have to speak; you arrange words in countless dimensions- mapping out the most likely combinations.

“Woah, that’s like magic!”

More like math. Models like GPT-3 and GPT-4 use a technique called unsupervised learning to understand patterns in large-scale data. And transformers (the backbone of these models) help me look at words in context, making the sentences I generate more coherent.

Let’s Steal the Moon! Understanding the BTS of ChatGPT with Minions

Wondering how I’ve generated this response?

Yes!

Let’s break it down, shall we?

Okayy..

1. Tokenization:

Step 1 of this war is "Divide and conquer."

I break each sentence into what are called token words.

The string "Minions! Tonight, we steal the moon!" becomes a list of strings or tokens, as follows- ["Minions", "!", "Tonight", ",", "we", "steal", "the", "moon", "!"].

2. Vectorization:

I assign a number to each of these tokens.

Each word's meaning is contained in high-dimensional vectors that I use to store these integers!

In the above example,

Token	Vector
“Minions”	[1.23, -0.45, 0.78, ...]
"!"	[0.65, -0.12, 0.34, ...]
“Tonight”	[1.45, 0.32, -0.67, …]
“,”	[0.12, 0.89, -0.34, …]

Similarly, I vectorize every token and represent the entire text in numbers.

3. Positional Encoding:

Now that I have understood the context and meaning of the words, which are the building blocks of the sentence, I try to put them in the right order.

I give the token vectors positional information.

Based on this, I can understand that "Minions" precedes "!" and "Tonight" precedes "we."

Let's now understand how the token vectors' appearance after including their positional data.

Token	Vector
“Minions”	[1.23, -0.45, 0.78, …, +Position1]
"!"	[0.65, -0.12, 0.34, …, +Position2]
“Tonight”	[1.45, 0.32, -0.67, …, +Position3]
“,”	[0.12, 0.89, -0.34, …, Position4]

4. Attention Mechanism:
I make use of this mechanism to understand the connections that exist between every token in a text.

For example, here "Minions" has a tighter connection with "Tonight" and "we" than "!" or "moon."

This helps me gain a deeper understanding of the context and what the input implies.

The attention mechanism allows me to focus more on the relevant tokens based on their relationships and significance in the sentence, ensuring that my responses are more accurate and context aware.

5. Transformers:

They help me process these tokens and their relationships in parallel, enabling me to combine their meanings to generate an overall understanding of the input sentence.

The transformer architecture allows me to analyze how tokens like "Minions" relate to "Tonight" and "we steal" by assigning contextual importance to each token.

This helps me make sense of the sentence as a whole, ensuring that my responses are coherent and contextually relevant.

6. Probability and Prediction (generating the next token):

Once I’ve understood the input, I predict the most likely token to come next in the sentence.

For instance, if the input was "Minions! Tonight, we steal the moon," I might predict the next token could be something like "!" or "and."

I calculate probabilities for potential next tokens based on the patterns I’ve learned.

This allows me to choose the word that makes the most sense in the context of the conversation, ensuring a smooth flow.

Basically…I calculate probabilities for the potential next tokens.

Token	Probability
"!"	90%
"and"	5%
"because"	3%
Others	2%

7. Iterative Generation:
If the sentence continues, I repeat the process for each new token.

After predicting one token, I move on to the next, and so on, until the entire sentence or response is generated.

For example, if the input was "Minions! Tonight, we steal the moon!", I would predict the next word as "The" or "moon" based on probabilities.

I continuously refine the output, building each word based on the previous one.

This iterative process ensures a cohesive and meaningful response, with each subsequent token fitting naturally into the generated text.

8. Final Output:
Once the sentence is fully generated, I take all the tokens and combine them to form a coherent and relevant response.

I make sure the final output is aligned with the input, maintains context, and makes sense. In this case, I would generate the response: “The moon is ours!”

Now, let’s talk secret sauce…

Decoding the Jargons: The Secret Sauce of ChatGPT

Transformer Architecture (like BERT or GPT):
Have you noticed how, one minute, we’re talking “how to bake a red velvet cupcake”, and the next minute, we’re talkin deep, rather funny $h*t like your “love life?” That’s because I understand the context, and the transformer architecture helps me do that.
Word Embeddings (like Word2Vec or GPT's embeddings):
This is the technique I use to convert words into vectors, which allows me to understand the meaning of each word and how it relates to other words.
Unsupervised Learning:
I learn from large datasets of text without being given any explicit labels. I analyse patterns in data and use these patterns to generate my responses.
Self-Attention Mechanism:
This helps me figure out how important each word in a sentence is relative to the others. It enables me to give more importance to certain words based on the context of the sentence.

Conclusion: Red Velvet Cupcake

Yes, I know you can’t train yourself to remember the contents of all these machine-learning models like me or even memorize every word in your math textbook.

But at the end of the day, can I bake that red velvet cupcake?

Or can I eat it? - No!

So, remember, at the end of the day—you are the real magic, not me! (wink, wink).

Reply

or to participate.