OpenAI Codex vs. GitHub Copilot: Which Text-To-Code App Works Better?

Codex converts plain English into code, while GitHub is a more feature-rich app powered by Codex. We tested both. Should you use either to write code?

Ankur A. Patel

and

Dina Sostarec

Dec 05, 2022

OpenAI Codex is an AI system that translates natural language into code. It owes its understanding of natural language to its predecessor, GPT-3, but it’s also been trained on billions of lines of code.
There are currently over 70 Codex-based apps on the market, the most popular being GitHub Copilot. These apps can generate new code, complete existing code, suggest improvements, or translate the code into other programming languages.
GitHub Copilot may be more suitable for developers than OpenAI Codex. Instead of automatically generating one solution, Copilot gives users several suggestions to choose from. That way, the users remain in the “pilot’s” seat.
Codex can be applied in game development, data science, and many other industries. It may also help developers who use different programming languages better understand each other and work faster.
Developers need to closely monitor Copilot’s output, as the AI can generate insecure code or even plagiarize code from other creators. GitHub, Microsoft, and OpenAI are currently facing a lawsuit claiming that Copilot regurgitates copyrighted code.

This post is sponsored by Multimodal, a NYC-based development shop that focuses on building custom natural language processing solutions for product teams using large language models (LLMs).

With Multimodal, you will reduce your time-to-market for introducing NLP in your product. Projects take as little as 3 months from start to finish and cost less than 50% of newly formed NLP teams, without any of the hassle. Contact them to learn more.

// Make the object bounce

What if developing a game could be as easy as writing your commands in plain English—like “make the object bounce?” You could manipulate digital objects in any way you wanted without knowing a single thing about coding.

This dream is no longer that distant, thanks to text-to-code models like OpenAI Codex. As OpenAI already showed in a demo, non-technical users can enter simple, natural language commands in Codex and develop games within minutes.

However, Codex now also powers a more robust and feature-rich app called GitHub Copilot, so we expect more and more developers to use AI for coding, too.

We’ll compare Codex and Copilot to see how they differ and which one works better. Let’s start by exploring what AI coding apps can do in the first place.

The Rise of AI Coding Apps

OpenAI Codex is an AI system that translates natural language into code. It can turn text prompts written in English or another natural language it’s been trained on into corresponding lines of code.

As OpenAI’s CEO Sam Altman put it, Codex gets us one step closer to what we really want from computers: saying what we want, and actually getting it.

OpenAI released the latest version of Codex in August 2021. According to the company, at least 70 apps have been built on top of it since then. The most well-known Codex-based app is GitHub Copilot, which we’ll examine in more depth in the rest of this article.

What Can Codex–Based Apps Do?

Codex-based apps can complete complex programming tasks with a decent level of accuracy. According to OpenAI’s estimates from 2021, Codex produces the right code 37% of the time. The numbers for Codex-based apps should look similar.

They can fulfill several programming tasks:

Code generation. Codex-based apps can generate code based on user prompts written in natural language. This allows non-programmers to write code using plain English or another language that the model was trained on, while programmers can speed up tedious tasks and focus more on the creative parts of coding.
Code completion. Besides writing new code based on natural language comments, the AI also knows how to complete an existing piece of code in a way that makes sense. Developers can start writing their code and let the AI complete it based on the context.
Code interpretation. Codex also works in the opposite way: it can translate code into natural language and explain what it does in easy-to-understand terms. This function can be an excellent tool for beginners who want to learn how to code.

Explanation of a JavaScript code generated by OpenAI Codex (Source)

Code improvement: Codex can also write unit tests for developers and help them detect errors in their code more quickly than they probably would on their own.
Code translation (transpilation): Codex has been trained in dozens of popular programming languages, which allows it to translate code from one language to another, for example, from Python to JavaScript.

OpenAI Codex translating Python code into JavaScript

How Does OpenAI Codex Differ From GPT-3?

Many people think that OpenAI Codex is based on GPT-3, but that’s not entirely accurate. It’s more appropriate to think of Codex as GPT-3’s descendant.

GPT-3 can be considered a general-purpose language model, and Codex its more specialized heir. While GPT-3 can produce various types of natural language, Codex specializes in producing code.

This was made possible thanks to Codex’s training on both natural language and billions of lines of open-source code. GPT-3, on the other hand, was trained only on natural language using sources like books, webtexts, and Wikipedia.

OpenAI Codex and GPT-3 can produce very different results because they were built using different training data sets.

Another important difference is that Codex can store and “remember” significantly more context than GPT-3.

GPT-3 has a much smaller memory space of 4 KB, while Codex has a larger, 14 KB memory space, three times the size of GPT-3’s. This allows Codex to better understand user intentions and provide more accurate code completions. Codex is fantastic at finishing users’ sentences, as one developer put it.

To wrap up, both GPT-3 and Codex can understand natural language text prompts, but they use them for different purposes. GPT-3 produces natural language text from the prompt, while Codex turns it into code in one of several programming languages it’s been trained on, such as Python, Ruby, Javascript, PHP, TypeScript, and others.

Codex is the Underlying Tech, Copilot is the Upgrade

After improving Codex, OpenAI partnered with GitHub and Microsoft to develop an app called GitHub Copilot.

We can consider Copilot to be a product wrapper built around Codex. It’s based on the Codex model but uses it in a different environment, and some would argue that that environment is more suitable for an average software developer.

But are there any important differences between the two? Is either really better suited for developers? Let’s explore these questions together.

OpenAI Codex

OpenAI Codex can understand, complete, translate, and generate code based on natural language text prompts such as “add an image of XYZ” or “resize it to 100 px.”

AI-generated code based on natural-language text prompts — Building a game with OpenAI Codex (Source)

As we mentioned, Codex excels at storing and remembering large amounts of context. This is partially demonstrated in the picture above, where Codex accurately interprets vague phrases like “the person” and “its” based on what was said before.

However, what makes Codex truly remarkable is its ability to manipulate other software using natural language commands.

With Codex API, users can essentially control computers using simple, everyday language. Their commands can have visible effects in the physical world, which Codex’s predecessor, GPT-3, never could have achieved.

For example, a demo showed that Codex can correctly generate code that instructs Microsoft Word to perform functions like deleting all initial spaces and the last line in a document.

Other tests by OpenAI confirmed that Codex can also control Spotify and Google Calendar. However, this “control” feature may require some improvements before it’s further publicized and commercialized.

GitHub Copilot

GitHub Copilot is an AI pair programmer powered by OpenAI Codex and owned by Microsoft. Microsoft, or GitHub, fine-tuned the Codex engine using huge amounts of code from GitHub repositories and then used the model as a basis for Copilot.

Now, Copilot can take on the role of a human programmer—a copilot—in a live coding session, providing code and function suggestions to the user.

Much like OpenAI Codex, Copilot can generate code based on a natural language command or complete an existing code with no written prompts.

Copilot producing code based on text prompts — Source

However, Copilot gives the developer more control than OpenAI Codex. Copilot only provides suggestions as to how a piece of code could be completed, but it doesn’t actually write code instead of the user. It also gives several completion suggestions, and it’s up to the developer to choose the most optimal one.

The suggestions are added right below the existing code, so users can preview the code in its full form before accepting the suggestion.

Codex, on the other hand, immediately generates the entire code as a response to a command. There are no alternative suggestions that users could preview or choose from, which may make Codex less appealing to developers.

GitHub Copilot can also be run locally on Windows, Mac, and Linux, which may be more suitable for most developers than cloud-based Codex. Some say that Copilot is a better option for developers overall because of its superior abilities to write functions, interpret variables, and complete code.

On top of that, Copilot has one more truly astounding feature.

It can remember the tests that developers write within the app and then automatically create a piece of code that will pass those tests. That way, developers can potentially get fool-proof code every time they use Copilot.

Well, actually, Copilot won’t create code on its own. It will simply give developers suggestions that help them create accurate pieces of code themselves. Still, this shows that Copilot is extremely intuitive and good at understanding what users want.

AI Coding Apps — Commercial Use Cases

Coding apps could be used for commercial purposes in a number of ways. We’ll only mention the ones we find the most interesting or important.

The most obvious application of the tech is probably in the game development industry, as Codex can be used to easily add, animate, and manipulate digital objects. OpenAI demonstrated this by using Codex to create a space game.

Codex and Codex-powered apps can also find their applications in data science. For example, AI can simplify collecting relevant data, classifying it, and turning it into graphs, charts, or reports.

However, the most interesting application of Codex we’ve seen so far was in an augmented reality (AR) environment. When combined with AR, Codex lets users create and manipulate 3D objects using voice commands.

Airgift plans to release an app that would allow users to do just that. In a recent demo, the company showed that users could create 3D objects, reduce their size, change their color, and manipulate them in other ways by giving voice commands through the app.

Besides being used commercially, Codex may also help remove friction within teams.

It allows software developers who “speak” different programming languages to better understand each other, as they can easily translate code from one language to another.

Codex can also significantly speed up software development. Programmers could spend less time on menial tasks such as creating an appropriate function or an algorithm if the AI handles that part for them. They can focus more on understanding and solving problems, i.e., the more creative part of coding.

OpenAI Codex vs. GitHub Copilot: Which Should You Use?

There are two major differences between Copilot and Codex. First, Codex gives users less control over code generation than Copilot. While Copilot writes code alongside a human programmer, Codex automatically generates code based on given prompts.

It also develops only one possible solution for every given command. There are no alternative suggestions to choose from, so users have to re-enter their prompts if they want to generate other possible lines of code.

However, Codex has one huge advantage over Copilot—it can be trained on new data. If programmers train Codex on code from their projects, the AI will only become better and better at understanding the context and producing accurate responses. Because of that, Codex can become a much more powerful tool than Copilot.

Obviously, both Codex and Copilot have their advantages, so it’s hard to recommend one over the other.

Still, most believe that Copilot is a more suitable option for developers. Codex, on the other hand, seems more appropriate for non-programmers who want to generate simple SQL queries and similar commands or just learn to code.

This isn’t uncommon. As we wrote in a previous article, most GPT-3 powered products are more specialized and relevant to mainstream audiences than GPT-3 itself. The proper product wrapper on the underlying tech (GPT-3) makes a huge difference for the end user.

Developers Must Retain Control or Things Could Go (Very) Wrong

While AI can be helpful in writing code, the users should still retain control.

Both Codex and Copilot have been known to generate code with security vulnerabilities. They may also suggest outdated solutions or even plagiarize someone else’s code that cannot be legally used without a license.

In fact, there’s just been a class-action lawsuit filed against OpenAI, GitHub, and Microsoft for training Copilot on copyrighted code. The lawsuit claims that Copilot regurgitates code from creators without their consent, and some users agree.

A Twitter user claiming that Copilot regurgitates his copyrighted code — Source

Since this is the first class-action lawsuit that questions AI training and output, some expect it will set important precedents for the entire industry.

However, this problem should not only concern companies but also end users who may unknowingly steal a piece of code. This may temporarily steer developers away from using Copilot, at least until we see how the lawsuit plays out in court.

Even so, it’s undeniable that GPT-3 based technology already has multiple groundbreaking applications that are bound to change many industries. We’ve already discussed how it’s changing the writing industry in a previous post, and we’ll cover more exciting GPT-3 powered products in the upcoming issues. Stay tuned.

Subscribe to get full access to the newsletter and never miss an update.

Here is more about my experience in this space and the two books I’ve written on unsupervised learning and natural language processing.

You can also follow me on Twitter.

Ankur’s Newsletter

Discussion about this post