Generative AI and Large Language Models Demystified

3 Likes

In November 2022, ChatGPT took the nation by storm. It had over a million users within the first five days.

ChatGPT captured the world’s attention and propelled AI into mainstream conversations by everyday people, allowing them to witness the disruptive potential of AI first hand. But what is it ChatGPT? How does it work? This post will explain it in layman’s terms.

ChatGPT is a generative AI chatbot. GPT is the large language model (LLM) behind the scenes. ChatGPT allows users to interact with the underlying model in a way that kind of scarily mimics interactions with a real human.

Is ChatGPT more than just a glorified search engine or Alexa on steroids? Actually, yes it.

ChatGPT and other generative AI tools can do a wide variety of tasks. For example, they can:

Write your essay on The Odyssey for your English class (but no guarantees that you’ll get an A)
Run a calculation (no guarantee that results are correct)
Generate some computer code (no guarantee you won’t have bugs)
Have a conversation with you (no guarantee that everything ChatGPT tells you is true any more than when you shoot the breeze with your Uncle Charlie at a family gathering)

You may be beginning to see a theme here…no guarantees. But let’s set aside the topic of accuracy or quality for now. Additional examples of things that ChatGPT and other generative AI tools (backed by various LLMs) can do:

Predict protein structures
Fill the shoes (figuratively) of some humans at customer support phone banks
Generate images and videos from text or, conversely, generate text from images and videos
Develop and tell stories to your children
Write tweets and emails
And of course, true to their roots, translate text from one language to another
Even be a supportive friend? See this advertisement for Pi, created by InflectionAI.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Based on the question you provide, ChatGPT uses machine learning algorithms to attempt to understand the context of the conversation and generate what sound like appropriate responses. To produce a response, it predicts the next word in a given sequence based on patterns its learned from human language. However, sometimes ChatGPT will simply string names and words together and confidently provide an answer that sounds coherent but in fact has no basis in reality. These are termed AI "hallucinations".

ChatGPT is available as an API that can be integrated into different applications and is embedded into Microsoft products. Remember that ChatGPT backed by the LLM GPT is just one of many generative AI chatbots backed by other LLMs. LLMs can have billions of parameters and be trained on terabytes of text. Many of them accomplish similar tasks, but some with niche focuses.

How Do Large Language Models Like GPT Work?

GPT stands for Generative Pre-Trained Transformer. Let’s break that up:

Generative

LLMs are able to generate new text, images and videos that seem coherent and real. Generative AI is not just cutting and pasting. It produces original content on demand. Rather than simply regurgitating searched information, generative AI creates new text, images, videos, etc. It can write original essays or create original art. It can even create virtual “people.” For example, the generative AI WuDao 2.0 created an imaginary student Zhibing Hua who is tasked with learning all the material at Tsinghua University. See this link to watch a supposed video of Zhibing Hua.

Pre-Trained

Unlike a search engine that searches any data, pre-trained LLMs like GPT are trained on a specific set of data over a specific time. For example, the ChatGPT released in November 2022 did not have information on any events or web content that occurred after 2021. To train LLMs, web crawlers are used to pull hundreds of GB of data from books, articles, web pages, and social media on the Internet. GPT-3 was trained on close to 500 billion words from millions of websites. LLMs also adds new user inputs to their “knowledge”.

One of the tricks is differentiating fact from fiction. Open AI made a considerable effort to train GPT on credible sources. To accomplish this, they tried to only scrape web pages that had been curated or filtered by humans. ChatGPT was trained with both supervised learning and with reinforcement learning with humans ranking generated output.

Transformer

GPT is an autoregressive Transformer model. Transformer models are based solely on attention mechanisms. The Transformer model inputs tokens. It contextualizes each token with other (unmasked) input tokens using an attention mechanism.

To understand this better, let’s explore a little history of language models that led us to where we are today.

History

The roots of the deep learning techniques that have enabled the advent of the modern LLMs lie in efforts to improve automated machine translations from one language to another. What is translation---in fact, what is language itself--if not a longing to connect? An innate yearning to understand another being and in turn be understood by that being.

Natural language processing and word sense disambiguation are essential components of translation. Other tasks by LLMs include content generation, automated data annotation, completion prediction, and the more advanced tasks of reading comprehension, commonsense reasoning, and natural language inferences.

Many older language models were based on recurrent neural networks (LSTM or GRU) in an encoder-decoder configuration. (See my earlier post Recurrent Neural Networks in SAS Viya for more information about RNNs). Over time, language models grew and grew. See graphic showing language models over time by the number of parameters.

Source: https://cmte.ieee.org/futuredirections/2023/03/18/add-ernie-bot-to-the-list

In addition to increasing is size, the 2010s saw some tremendous breakthroughs in model architecture, most significantly, the move to Transformer architectures. Prior to the advent of Transformer architectures, natural language processing commonly used RNNs, such as LSTMs and GRUs. Eventually, some of these older models also connected the encoder and decoder through an attention mechanism. An attention mechanism selectively focuses on parts of the source sentence during translation.

To understand the importance of attention mechanisms, let us first turn our attention to language translation. Some languages are fairly easily translated word for word, because the word order is similar. In other languages, the word order is quite different. Let’s pick on English versus German for example. A literal word-for-word translation does not work. One author illustrates this as follows.

Another good example of attention is illustrated in the following sentence. Following the word "eating" a noun that is a food item is expected.

You can interpret attention as a vector of importance weights. To predict a word in a sentence, you can use the attention vector to estimate how strongly it is correlated with (i.e., how well it "attends to") other elements. Sum the values weighted by the attention vector as the approximation of the target.

It was discovered that not only was the attention mechanism helpful, it was actually all you need! Suddenly, recurrence and convolutions were dispensed with entirely, and the Transformer architecture took over. The Transformer architecture is able to take excellent advantage of parallel processing, thereby reducing compute time. The Transformer model inputs tokenized (byte pair encoding) input tokens. At each layer, the Transformer contextualizes each token with other (unmasked) input tokens in parallel via the attention mechanism.

The Transformer architecture is the key to modern LLMs like GPT. Not only can the Transformer architecture be used for natural language processing, but it also works for computer vision and audio processing.

Note that these advances would not be possible without the ready availability of large scale datasets and cloud computing.

ChatGPT's Recent History Specifically

So let’s examine the recent timeline specifically of Open AI’s ChatGPT. ChatGPT was developed by OpenAI, an AI research laboratory that includes:

Open AI Incorporated (non-profit)
OpenAI LP (for profit subsidiary)

Here’s a summary timeline:

2015 (December): Open AI launched as a non-profit.
2018: Elon Musk resigned his Open AI board seat, citing "a potential future conflict."
2019 (March 11): OpenAI formed the for-profit subsidiary OpenAI LP.
2019: OpenAI LP received a $1 billion investment from Microsoft.
2020 (May): The Large Language Model GPT-3 was released.
2022 (October 27): Open AI board member and largest funder Elon Musk finalized acquisition of the major social media company Twitter.
2022 (November 30): OpenAI released an early demo of ChatGPT and made it available to the public for free; the chatbot quickly went viral on social media.
2023 (January): ChatGPT reached 100 million monthly active users just two months after launch, making it the fastest-growing consumer application in history. For comparison, TikTok took about nine months after its global launch to reach 100 million users, and Instagram took well over two years. This large user base provided valuable feedback to help train ChatGPT's responses.
2023 (January 23): Microsoft confirmed new multiyear investment of billions of dollars into ChatGPT. FYI, Microsoft Azure is the exclusive cloud provider for ChatGPT and, in fact, for all Open AI tools.
2023 (March 16): Microsoft announced that ChatGPT’s LLM is embedded in Microsoft 365 Copilot to help relieve us from the “drudgery of work.”
2023 (March 22): News media and social media reported on an open letter with famous signatories including Elon Musk asking to “Pause Giant AI Experiments.” This generated additional buzz about ChatGPT and AI.
2023 (April 10): Open AI Chief Executive met with Japan’s prime minister Fumio Kishida.
2023 (July 26) Evan Gold of StockMKTNewz reported on Twitter that Microsoft will supply Japan’s government with the technology underpinning its ChatGPT generative AI for use in clerical work and analysis.

ChatGPT WannaBes

Once ChatGPT went viral, other large tech companies such as Alibaba, Baidu, Google and Meta accelerated development of their own LLMs. Some of these, such as LlaMA by Meta, are autoregressive decoder models quite similar to GPT. There are tons of chatbots and LLMs; below are listed just a few.

Many of these other models compare themselves to GPT, and claim they are better. Below see Jurassic-1's comparison to GPT-3.

Baidu also claims that Ernie 3.5 surpassed both GPT-3 and GPT-4 in several benchmark tests.

Regulation

As with all areas of AI, regulations and norms are lagging behind the rapid pace of the technology's development. A couple examples of steps toward generative AI regulation by major world powers include:

The Cyberspace Administration of China has mandated security reviews for all generative AI-related services who wish to operate in China. They also have declared AI providers responsible for the validity of the data they use to train generative AI tools.
On July 22, 2023, representatives from a number of tech companies (e.g., OpenAI, Microsoft, Amazon, Anthropic, Google, Inflection AI, and Meta) met at the US White House in advance of a Presidential Executive Order to be released by President Biden. The role of the tech companies was ostensibly to make commitments on measures to ensure AI safety. These voluntary commitments allegedly include:
- Watermarking AI generated videos
- Thoroughly testing systems before release
- Investing in cybersecurity
- Dedicating resources to solve societal challenges such as climate change

Is SAS’s ChatBot application similar to ChatGPT?

No. SAS's Chatbot is not currently integrated with GPT or any other modern LLM. SAS is currently not investing in developing its own LLM, and it is not on the roadmap. It is possible that some SAS products will be integration with LLMs in the future.

Conclusion

Has ChatGPT added fuel to the AI race? Will improved generative AI in one organization ignite the spending of resources in other organizations to produce even better generative AI? Will big tech companies, startups, universities and governments all race to see who can develop better, faster, more comprehensive generative AI systems?

What is certain is that massive amounts of financial and energy resources are being invested in generative AI, LLMs and foundation models. Because generative AI is extremely energy-intensive, costs and carbon emissions are also being discussed and considered.

AI marketers are also very cautious to spin their messages to assure humans that AI is not here to take your job. They use terms like “AI co-pilots” and maintain that only the drudgery of work will be taken by AI, leaving the fun and creative stuff for humans.

Do you think that’s true?

I will leave you with a quote from Professor Wang Xiaogang, the Co-founder and Chief Scientist of SenseTime. “AGI [Artificial General Intelligence] has given rise to a new research paradigm, which is based on powerful foundation models, unlocking new capabilities through reinforcement learning and human feedback, therefore efficiently solving open-ended tasks. AGI will evolve from a 'data flywheel' to a 'wisdom flywheel', ultimately leading to human-machine symbiosis.”

For More Information

Find more articles from SAS Global Enablement and Learning here.

Generative AI and Large Language Models Demystified

Free course: Data Literacy Essentials

Get Started