In November 2022, ChatGPT took the nation by storm. It had over a million users within the first five days.
ChatGPT captured the world’s attention and propelled AI into mainstream conversations by everyday people, allowing them to witness the disruptive potential of AI first hand. But what is it ChatGPT? How does it work? This post will explain it in layman’s terms.
ChatGPT is a generative AI chatbot. GPT is the large language model (LLM) behind the scenes. ChatGPT allows users to interact with the underlying model in a way that kind of scarily mimics interactions with a real human.
Is ChatGPT more than just a glorified search engine or Alexa on steroids? Actually, yes it.
ChatGPT and other generative AI tools can do a wide variety of tasks. For example, they can:
You may be beginning to see a theme here…no guarantees. But let’s set aside the topic of accuracy or quality for now. Additional examples of things that ChatGPT and other generative AI tools (backed by various LLMs) can do:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Based on the question you provide, ChatGPT uses machine learning algorithms to attempt to understand the context of the conversation and generate what sound like appropriate responses. To produce a response, it predicts the next word in a given sequence based on patterns its learned from human language. However, sometimes ChatGPT will simply string names and words together and confidently provide an answer that sounds coherent but in fact has no basis in reality. These are termed AI "hallucinations".
ChatGPT is available as an API that can be integrated into different applications and is embedded into Microsoft products. Remember that ChatGPT backed by the LLM GPT is just one of many generative AI chatbots backed by other LLMs. LLMs can have billions of parameters and be trained on terabytes of text. Many of them accomplish similar tasks, but some with niche focuses.
How Do Large Language Models Like GPT Work?
GPT stands for Generative Pre-Trained Transformer. Let’s break that up:
Generative
LLMs are able to generate new text, images and videos that seem coherent and real. Generative AI is not just cutting and pasting. It produces original content on demand. Rather than simply regurgitating searched information, generative AI creates new text, images, videos, etc. It can write original essays or create original art. It can even create virtual “people.” For example, the generative AI WuDao 2.0 created an imaginary student Zhibing Hua who is tasked with learning all the material at Tsinghua University. See this link to watch a supposed video of Zhibing Hua.
Pre-Trained
Unlike a search engine that searches any data, pre-trained LLMs like GPT are trained on a specific set of data over a specific time. For example, the ChatGPT released in November 2022 did not have information on any events or web content that occurred after 2021. To train LLMs, web crawlers are used to pull hundreds of GB of data from books, articles, web pages, and social media on the Internet. GPT-3 was trained on close to 500 billion words from millions of websites. LLMs also adds new user inputs to their “knowledge”.
One of the tricks is differentiating fact from fiction. Open AI made a considerable effort to train GPT on credible sources. To accomplish this, they tried to only scrape web pages that had been curated or filtered by humans. ChatGPT was trained with both supervised learning and with reinforcement learning with humans ranking generated output.
Transformer
GPT is an autoregressive Transformer model. Transformer models are based solely on attention mechanisms. The Transformer model inputs tokens. It contextualizes each token with other (unmasked) input tokens using an attention mechanism.
To understand this better, let’s explore a little history of language models that led us to where we are today.
History
The roots of the deep learning techniques that have enabled the advent of the modern LLMs lie in efforts to improve automated machine translations from one language to another. What is translation---in fact, what is language itself--if not a longing to connect? An innate yearning to understand another being and in turn be understood by that being.
Natural language processing and word sense disambiguation are essential components of translation. Other tasks by LLMs include content generation, automated data annotation, completion prediction, and the more advanced tasks of reading comprehension, commonsense reasoning, and natural language inferences.
Many older language models were based on recurrent neural networks (LSTM or GRU) in an encoder-decoder configuration. (See my earlier post Recurrent Neural Networks in SAS Viya for more information about RNNs). Over time, language models grew and grew. See graphic showing language models over time by the number of parameters.
Source: https://cmte.ieee.org/futuredirections/2023/03/18/add-ernie-bot-to-the-list
In addition to increasing is size, the 2010s saw some tremendous breakthroughs in model architecture, most significantly, the move to Transformer architectures. Prior to the advent of Transformer architectures, natural language processing commonly used RNNs, such as LSTMs and GRUs. Eventually, some of these older models also connected the encoder and decoder through an attention mechanism. An attention mechanism selectively focuses on parts of the source sentence during translation.
To understand the importance of attention mechanisms, let us first turn our attention to language translation. Some languages are fairly easily translated word for word, because the word order is similar. In other languages, the word order is quite different. Let’s pick on English versus German for example. A literal word-for-word translation does not work. One author illustrates this as follows.
Another good example of attention is illustrated in the following sentence. Following the word "eating" a noun that is a food item is expected.
You can interpret attention as a vector of importance weights. To predict a word in a sentence, you can use the attention vector to estimate how strongly it is correlated with (i.e., how well it "attends to") other elements. Sum the values weighted by the attention vector as the approximation of the target.
It was discovered that not only was the attention mechanism helpful, it was actually all you need! Suddenly, recurrence and convolutions were dispensed with entirely, and the Transformer architecture took over. The Transformer architecture is able to take excellent advantage of parallel processing, thereby reducing compute time. The Transformer model inputs tokenized (byte pair encoding) input tokens. At each layer, the Transformer contextualizes each token with other (unmasked) input tokens in parallel via the attention mechanism.
The Transformer architecture is the key to modern LLMs like GPT. Not only can the Transformer architecture be used for natural language processing, but it also works for computer vision and audio processing.
Note that these advances would not be possible without the ready availability of large scale datasets and cloud computing.
ChatGPT's Recent History Specifically
So let’s examine the recent timeline specifically of Open AI’s ChatGPT. ChatGPT was developed by OpenAI, an AI research laboratory that includes:
Here’s a summary timeline:
ChatGPT WannaBes
Once ChatGPT went viral, other large tech companies such as Alibaba, Baidu, Google and Meta accelerated development of their own LLMs. Some of these, such as LlaMA by Meta, are autoregressive decoder models quite similar to GPT. There are tons of chatbots and LLMs; below are listed just a few.
Many of these other models compare themselves to GPT, and claim they are better. Below see Jurassic-1's comparison to GPT-3.
Baidu also claims that Ernie 3.5 surpassed both GPT-3 and GPT-4 in several benchmark tests.
Regulation
As with all areas of AI, regulations and norms are lagging behind the rapid pace of the technology's development. A couple examples of steps toward generative AI regulation by major world powers include:
Is SAS’s ChatBot application similar to ChatGPT?
No. SAS's Chatbot is not currently integrated with GPT or any other modern LLM. SAS is currently not investing in developing its own LLM, and it is not on the roadmap. It is possible that some SAS products will be integration with LLMs in the future.
Conclusion
Has ChatGPT added fuel to the AI race? Will improved generative AI in one organization ignite the spending of resources in other organizations to produce even better generative AI? Will big tech companies, startups, universities and governments all race to see who can develop better, faster, more comprehensive generative AI systems?
What is certain is that massive amounts of financial and energy resources are being invested in generative AI, LLMs and foundation models. Because generative AI is extremely energy-intensive, costs and carbon emissions are also being discussed and considered.
AI marketers are also very cautious to spin their messages to assure humans that AI is not here to take your job. They use terms like “AI co-pilots” and maintain that only the drudgery of work will be taken by AI, leaving the fun and creative stuff for humans.
Do you think that’s true?
I will leave you with a quote from Professor Wang Xiaogang, the Co-founder and Chief Scientist of SenseTime. “AGI [Artificial General Intelligence] has given rise to a new research paradigm, which is based on powerful foundation models, unlocking new capabilities through reinforcement learning and human feedback, therefore efficiently solving open-ended tasks. AGI will evolve from a 'data flywheel' to a 'wisdom flywheel', ultimately leading to human-machine symbiosis.”
For More Information
Find more articles from SAS Global Enablement and Learning here.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.