GPT-4, GPT-3, and GPT-3.5 Turbo: A Review Of OpenAI's Large Language Models

A rundown of the features of OpenAI's newest Large Language Model and a comparison of capabilities from previous GPTs.

Ankur A. Patel

Bryant Linton

, and

Dina Sostarec

Apr 10, 2023

OpenAI released GPT-4 through a paid subscription on March 13, 2023. GPT-4 improves on the shortcomings of GPT-3 Davinci and on GPT-3.5 Turbo and its associated application, ChatGPT.
GPT-4 is capable of passing state bar exams in the 90th percentile, achieving multimodality and generating text with fewer errors than GPT-3 or GPT-3.5 Turbo.
OpenAI has stated that GPT-4 has the capacity to analyze any image given to it and generate a text summary of said image. However, OpenAI has not yet released this image summarization feature.
GPT-4’s max token span of 8k (and soon 32k) improves on GPT-3.5 Turbo’s 4k and GPT-3 Davinci’s 2k. Though GPT-4 still struggles with AI hallucination, the increase in max token span opens the potential for longer-form content such as articles, chapters, and even books.
GPT-4 and GPT-3.5 Turbo utilize an optimization technique known as reinforcement learning from human feedback (RLHF). This improvement allows the models to perform more accurately than GPT-3 Davinci, which was not trained using RLHF.
GPT-4 maintains OpenAI’s competitiveness in the field of generative AI, performing better than Google’s 2023 release of Bard and Microsoft’s 2023 release of its AI-powered Bing assistant.

This post is sponsored by Multimodal, a NYC-based development shop that focuses on building custom natural language processing solutions for product teams using large language models (LLMs).

With Multimodal, you will reduce your time-to-market for introducing NLP in your product. Projects take as little as 3 months from start to finish and cost less than 50% of newly formed NLP teams, without any of the hassle. Contact them to learn more.

GPT-4 aced the Bar Exam.

GPT-4 scored higher than most law students on the American bar exam. This milestone in artificial intelligence illustrates the rapid advances in Natural Language Processing (NLP) that have come out of OpenAI’s new release of GPT-4 this March. GPT-4 is an improvement on the wildly popular generative Large Language Model (LLM) GPT-3.5 Turbo, made widespread by OpenAI's browser application ChatGPT.

GPT-4 builds off of what OpenAI learned after publicly releasing a research preview of ChatGPT, a chatbot-like application based on GPT-3.5 Turbo. But how do the improvements of GPT-4 compare to other iterations of OpenAI’s Large Language Models, GPT-3 Davinci and GPT 3.5 Turbo?

GPT-3

Generative Pre-Training-3, or GPT-3, builds off of OpenAI’s AI language model released in 2019, GPT-2. Like its predecessor, GPT-3 is a large language model that can produce strings of complex language when prompted through natural language.

Importantly, 2020’s release of GPT-3 was trained on 175 billion parameters. GPT-3 was a landmark achievement in the capabilities of a Large Language Model. 60% of data used in GPT-3’s model training was scraped from Common Crawl, a dataset that at the time of GPT-3’s release encompassed 2.6 billion stored web pages. The sheer size of the training data and parameters led to accurate performance leaps above GPT-2.

There are multiple release versions of GPT-3, but in this article, we will reference the GPT-3 Davinci stable release.

Capabilities

In 2020, GPT-3 was the best generative model in existence. It outperformed all comparative models at the time, such as Google’s then-popular BERT.

GPT-3 brought an array of new possibilities to the table. Unlike BERT, GPT-3 could not only understand and analyze text but also generate it from scratch — be it an answer to a question, a poem, or a blog post heading.

The model also demonstrated notable improvements in terms of few-shot learning. Its ability to perform tasks with very little relevant training data was unmatched at the time. This meant that, unlike previous models, GPT-3 could perform reasonably well on tasks it has seen only a few times during training.

GPT-3's larger scale, unsupervised learning, few-shot learning, and generative capabilities made it a more powerful language model than BERT, particularly for tasks that involve generating new text. The model can produce lengthy, and relatively error-free, text from a natural language prompt. In fact, GPT-3 created articles when prompted that journalist testers had difficulty distinguishing from human-written articles.

Limitations

One problem with GPT-3 is AI hallucination, or when the model generates text that is not based on real-world knowledge or facts. This can happen when the model is presented with incomplete or ambiguous information or when it is asked to generate text about topics that it has not been trained on.

For example, if GPT-3 is asked to generate text about a hypothetical scenario that involves advanced technology that does not exist yet, it may produce text that includes details that are not possible or accurate. Similarly, if the model is asked to generate text about a complex scientific concept that it has not been trained on, it may confidently produce text that is inaccurate or misleading.

However, GPT-3 did markedly improve over GPT-2 which was much more prone to hallucinations.

Another limitation of GPT-3 is its lack of reasoning. The model relies on statistical patterns in the text it has been trained on and does not have a deep understanding of the world or the context in which the AI language model itself is used. This can lead to the model generating text that is technically correct but does not make sense in the broader context.

Finally, GPT-3 is trained on vast amounts of text data, which can reflect the biases and prejudices of the people who wrote it. If the training data is biased in some way, the model may learn and reproduce those biases in the text it generates.

The algorithms used to train GPT-3 may also be biased if they reflect the biases and assumptions of the people who designed them. For example, the algorithms may prioritize certain types of language or ideas over others, which can result in biased text generation.

Applications

Multiple applications are based on the GPT-3 model:

Chatbots: GPT-3 is being used to create more advanced chatbots that can understand natural language queries and respond in a more human-like way. These chatbots are being used for customer service, technical support, and other applications where users need assistance. A startup known as AskBrian utilizes AI to aid business professionals and consultants with problems related to business processes and management.
Content Creation: GPT-3 is also being used to generate content, such as news articles, product descriptions, and even fiction. These applications can save time and money for companies that need to produce a lot of content quickly like Marketing or Content Creators. The tool known as Jasper can build lengthy marketing copy, blog posts, and emails, shortening the workload for businesses.

Image 1: (Jasper API showing available tools and uses for the platform) — Jasper API showcasing generative tools (Source)

Code Completion: OpenAI and Github have partnered to develop Github Copilot. Copilot is a built-in cloud-based assistant for programmers in Visual Studio. It helps developers who use Visual Studio Code, Visual Studio, Neovim, and JetBrains integrated development environments (IDEs) by automatically completing code. The tool was announced by GitHub on June 29, 2021 and is currently only available to individual developers through a subscription. It is most effective when used for coding in Python, JavaScript, TypeScript, Ruby, and Go.

GPT-3.5 Turbo (Model Behind ChatGPT)

GPT-3.5 Turbo, released on March 1st, 2023, is an improved version of GPT 3.5 and GPT-3 Davinci.

OpenAI utilized a development technique known as Reinforcement Learning from Human Feedback (RLHF) when developing GPT-3.5 Turbo. This method of model training involves human feedback ‘rating’ a large language model’s performance. GPT-3.5 is a more robust model with more accurate and policy-optimized responses due to the heavy employment of RLHF in development.

ChatGPT, a web-browser application based off of a model from the GPT-3.5 series, was released on November 30th, 2022 and quickly outpaced all other tech products in terms of user adoption. In only 5 days, ChatGPT reached 1 million users.

Image 2: (A graph of ChatGPT User Stats race to a million users in comparison with other major internet and social media companies.) — ChatGPT statistics highlighting the explosive growth in users (Source)