The Rise of Open-Source LLMs in 2023: A Game Changer in AI

Discover the advancements of open-source LLMs compared to their closed-source counterparts and their promising future trajectory.

Ankur A. Patel

and

Saleem Maroof

Aug 22, 2023

Various open-source LLMs, such as WizardLM and LLaMA 2, are emerging as powerful competitors to closed-source models.
Vicuna is a cost-efficient LLM that achieved 90% of ChatGPT’s capabilities despite the entire training process costing just $300.
LLaMA 2 is setting the standard for ethical AI by achieving the lowest violation scores to date.
Open-source LLMs are increasingly supporting multiple languages, resulting in global collaboration and business opportunities.
The transparency and customizability of open-source LLMs offer advantages over “black box” closed-source LLMs, which often lack interpretability in how the model reached its conclusion.
Future directions for open-source models include smaller, more efficient models, as well as niche-specific LLMs for areas such as law and healthcare.

This post is sponsored by Multimodal, an NYC-based startup setting out to make organizations more productive, effective, and competitive using generative AI.

Multimodal builds custom large language models for enterprises, enabling them to process documents instantly, automate manual workflows, and develop breakthrough products and services.

Visit their website for more information about transformative business AI.

We’ve all heard of popular closed-source large language models (LLMs) like the GPT models, but open-source LLMs often undeservedly go under the radar.

Currently, open-source LLMs are a goldmine of untapped potential, slowly creeping up to the state-of-the-art closed-source models. They provide various advantages, such as increased customizability, more transparency in generating outputs, and an improvement in collaboration.

Let’s take a look at some of the main open-source models and their significance.

Current Landscape: Key Open-Source Large Language Models

Before we discuss why open-source models are changing the AI game, we’ll need to delve into some of the key models:

LLaMA 2

Meta released LLaMA 2 as a response to 100,000 requests for access to the previous model, LLaMA 1, which was only available through a non-commercial license. This LLaMA model underwent extensive pre-training using a similar model architecture to the first version and offers various improvements.

Trained on 40% more data than its predecessor and offering a range of model parameters from 7B to 70B, LLaMA 2 is a big step up from the first version and offers comparable performance to models like GPT-3.5.

LLaMA 2 used reinforcement learning from human feedback (RLHF) during the training process. This allows LLaMA 2 to learn more from its interactions with humans, improving the model's ability to have better conversations than other models.

You’ve probably heard a lot about AI safety when it comes to LLMs. There are concerns about using personal and sensitive information during the training process, as the model can start to use this information in its outputs.

This has been a key area of focus for Meta, so they tried to filter out as much personal data from their datasets as possible, helping LLaMA 2 conform to ethical guidelines. LLaMa 2 consistently achieves violation scores below 10%, which is much lower than other LLMs at the moment.

*Language model safety is a priority for Meta’s LLaMA 2.* (Source)

Meta also developed a LLaMA 2 model specifically for chatbot applications, called LLaMA-2-chat. Trained on 1 million parameters, LLaMA-2-chat is one of the best-performing open-source LLMs for the specific use-case of chatbots.

LLaMA 2 is proficient at various natural language processing (NLP) tasks, such as content generation, personalized recommendations, and customer service automation.

The pros of LLaMA 2 include:

Among the safest LLMs available: With LLaMA 2 achieving a violation score much lower than other LLMs, the model is setting the standard for ethical AI.
Ability to learn from human interactions efficiently: LLaMA 2 is also among the better-performing LLMs when it comes to having meaningful conversations with humans. This improvement is most likely a result of using RLHF during training.
Versatile: LLaMA 2 can carry out a whole host of different tasks, including question answering, text generation, sentiment analysis, and more.

The cons of LLaMA 2 include:

Limited language support: Although LLaMA 2 supports 20 languages, other open-source LLMs, such as mBERT (multilingual bidirectional encoder representations from transformers), support around 104 languages.
Low performance on coding tasks: Despite LLaMA 2’s versatility and ability to handle a wide range of tasks, coding isn’t its strong point, falling short in various benchmarks when compared to other open-source LLMs like MPT.
Struggles with specialized domains: LLaMA 2 was trained on 2 trillion tokens, which is still high but lower than other models, so it could be lacking in more specialized areas.

WizardLM

Researchers from Microsoft and Peking University released the WizardLM models in May 2023, taking the open-source LLM world by storm with their unique approach to the training process. The WizardLM models are also based on LLaMA 1, similar to the Vicuna models.

Previously, generating a large amount of instruction data was a recognized concern for LLMs.

But WizardLM used a unique approach called Evol-Instruct, which can rewrite basic instructions and convert them into a more complex set. The advantage of the Evol-Instruct method is that these instructions are of higher quality than manually generated instructions. It also helps automate the instruction generation process on a massive scale.

Trained on 2 trillion tokens, the WizardLM models are among the top-performing open-source models. Notably, WizardLM outperforms the Vicuna models and is very close to GPT-4 in terms of performance.

*WizardLM vs. GPT-4 across 24 different tasks*. (Source)

When it comes to use cases, the WizardLM models shine at handling more difficult tasks. One of them is code generation, as the developers of the base WizardLM models also created a specific model to tackle complex coding problems called WizardCoder.

Additionally, the base WizardLM models can handle various other demanding tasks, such as academic writing, answering difficult questions, and translating texts into multiple languages.

A quick roundup of the advantages of the WizardLM models includes:

High performance: The Evol-Instruct training process results in the WizardLM models being able to outperform models of similar size.
Ability to handle technical tasks: WizardLM can deal with difficult tasks like code generation by being trained on complex instruction data.
Regular updates and improvements: The developers released updated versions of the WizardLM models, such as WizardLM V1.1 and V1.2, which provide notable performance improvements.

However, some disadvantages include:

Issues with unreliable outputs: We mentioned earlier that a lot of LLMs often generate inaccurate or false outputs because of the training data used. Evol-Instruct doesn’t seem to deal with this specific issue, so the WizardLM models are still likely to occasionally generate incorrect outputs.
The instruction data generation process isn’t fully automated: Although the WizardLM models can automate some of this process, humans still need to check the quality of instruction data.
Higher learning curve: Since this model uses a novel training process, it might be difficult for new users to get to grips with this new approach when attempting to use the model.

Vicuna

Vicuna was released in March 2023 and developed by a team from various institutions, including UC Berkeley, CMU, Stanford, and UC San Diego. Built upon the LLaMA-13B model, Vicuna is an open-source chatbot that was fine-tuned with 70,000 user-shared conversations from ShareGPT.

There are 2 base models available: the Vicuna-13B and Vicuna-33B. What’s most impressive about Vicuna is the low training budget of $300, and its ability to achieve comparable performance to various other LLMs like LLaMA 1 and ChatGPT.

In particular, Vicuna can achieve 90% of ChatGPT's and Bard’s capabilities.

*Comparison of Vicuna’s response quality to GPT-4*. (Source)

Although Vicuna is an open-source LLM, with the code and weights of the model being publicly available on GitHub, Vicuna is only available through a non-commercial license. Businesses and individuals can’t use the model for profit, and the model is mainly intended for research purposes.

The positives of Vicuna include:

Cost-effectiveness: Vicuna’s complete training process cost was only $300, making it cost-efficient.
Effectiveness at conversational tasks: Vicuna is primarily used for developing chatbots and NLP research due to the massive amounts of human conversations used to fine-tune the model.
High performance: Despite using such a low training budget, Vicuna is able to achieve 90% of ChatGPT’s performance.

Although, some of the more negative aspects are:

Only available through a non-commercial license: Similar to LLaMA 1, Vicuna’s accessibility through a non-commercial license can limit the broader use cases of the language model.
Reliant on the quality of user-shared conversations: As ShareGPT was the main source of training data, there can be limitations in the model’s understanding, depending on the quality of user-shared conversations.
Not optimized for bias and safety: The fine-tuned conversations used for training data aren’t fully processed to adhere to ethical AI guidelines, and it’s possible these models produce biased outputs.

Falcon

The Technology Innovation Institute (TII) debuted the two basic Falcon models, the Falcon-7B and Falcon-40B, in March 2023. The Falcon-40B model is an autoregressive-decoder model, which means it makes predictions based on previous data, much like the GPT models.

Contrary to the other 3 models we looked at, the Falcon models aren’t based on LLaMA 1.

The quality of the training data is what distinguishes the Falcon models from other LLMs, with the RefinedWeb dataset supplying most of the training data. This training data was then filtered to further increase its quality.

You might be asking; why exactly did they go through all this hassle just to improve the quality of training data?

One big issue with many LLMs is that they often generate non-factual responses because of low-quality training data. Falcon models outperform other LLMs in terms of accuracy and reliability thanks to high-quality training data.

*The use of the RefinedWeb dataset results in better performance*. (Source)

The Falcon models are capable of conventional LLM tasks, such as language translation, text generation, creative tasks, and more.

However, one unique use case for which TII stated they designed the Falcon models is to improve the efficiency of companies in the United Arab Emirates by enabling them to automate repetitive tasks and improve workflows.

Here’s a short rundown of the advantages of the Falcon models:

Versatile: 2 base models are available for different needs and use cases. Falcon-7B is specifically designed for chatbot applications.
Efficiency: The larger Falcon model, Falcon-40B, outperforms GPT-3 despite having 75% of the training compute budget.
Quality training data: TII used a rigorous process to filter and process their scraped data from RefinedWeb.

On the other hand, some of the drawbacks of the Falcon models include:

Longer setup time: To use the Falcon models, users need to install a number of external packages, which increases the time it takes to start using the models.
Struggles with specific tasks: When it comes to tasks such as dialogue generation, the Falcon models aren’t able to outperform other LLMs in these areas.
Limited model sizes: There aren’t any base models with a number of parameters between 7 and 40 billion tokens, which might hinder the flexibility of these models for some users.

One challenge that can’t be ignored is the performance difference between current open-source LLMs and closed-source powerhouse models like GPT-4. Currently, the top open-source models like Wizard and LLaMA 2 still fall short when directly comparing their performance in various tasks compared to GPT-4.

Even though open-source LLMs are free to access, the computational demands to train these models are massive. Fine-tuning a model for specific use cases with large amounts of data may be out of reach for smaller organizations or individuals without the necessary computational infrastructure.

Training an open-source LLM is a costly endeavor for various reasons, including the requirements for storing large amounts of training data, high-performance GPUs, and large memory capacities.

On the topic of fine-tuning, adapting open-source LLMs to highly specialized areas is another challenge. A large amount of high-quality, specialized training data in areas such as specific medical or legal areas isn’t easily available. This makes it challenging for developers to adjust the LLM for these specific applications.

Additionally, data privacy is also a cause for concern. Many open-source LLMs commonly use vast quantities of training data from publicly available datasets. This poses an issue in terms of data privacy, as this training data may contain sensitive or private information. This leads to ethical and privacy concerns, as the model could use this information in its outputs.

Even though open-source LLMs have their fair share of issues, there are various breakthroughs taking place that help combat these issues.

Recent Breakthroughs

One breakthrough with open-source LLMs is that model sizes have drastically increased in recent times. With models like LLaMA 2 having 2 trillion parameters, open-source LLM parameter sizes are only expected to continue increasing.

This is crucial since open-source LLMs are beginning to outnumber their commercial, closed-source equivalents in terms of scale. However, it’s difficult to confirm this as closed-source LLMs often leave their parameter sizes undisclosed.

Another significant advancement is the expansion of the multilingual capabilities of open-source LLMs. One example of this is the one we mentioned earlier with mBERT, which provides support for 104 languages.

Open-source LLMs are revolutionizing global collaboration with multilingual support. Providing support for more languages enables experts from different countries to contribute and innovate together, which can lead to more global partnerships in research and business.

Speaking of business, multilingual support also leads to more opportunities for companies to adapt their strategies to a more global market. Multilingual open-source LLMs can assist in marketing and content creation for other countries, opening new doors for business growth.

We saw earlier that open-source LLMs like LLaMA 2 are reaching record lows when it comes to violation scores, setting a new standard for ethical AI. Consequently, more businesses and individuals could be more inclined to integrate LLMs into their workflows if they’re able to rest assured that the models are safe.

This also helps reduce the spread of harmful misinformation. Many LLMs are notorious for generating incorrect outputs, which is a major issue among LLM users. With increased safety, the likelihood of the model producing harmful or incorrect information is reduced.

Importance of Open-Source LLMs

You still might not be convinced about open-source LLMs and wonder why you should care about them.

As mentioned, a big advantage of open-source LLMs is that they encourage collaboration. With the underlying code and architecture available, researchers can advance the model by finding new applications and fixing any bugs. This results in quicker advancements and breakthroughs than in a closed-source setting.

One issue with closed-source LLMs is a lack of accessibility, meaning that the code for these models is unavailable to the public. However, open-source models provide the weights and code for their base models, allowing developers to experiment with them for free.

Closed-source LLMs often provide little flexibility and customization. On the other hand, open-source LLMs allow developers to modify and adjust the models for their own specific requirements. This means open-source LLMs can provide unique solutions to challenges that would otherwise be too tricky to deal with using closed-source LLMs.

One big problem with many LLMs is their “black box” nature. Due to the model's lack of interpretability, users are often left in the dark about how it arrived at its conclusions. Consequently, trusting the model’s decision-making process is difficult.

With open-source LLMs, the user can access the training algorithms and model architecture to better understand how the model generates its outputs. This helps increase transparency and trust by understanding the processes taking place under the hood.

Future Directions

Open-source LLMs have yet to outperform top closed-source LLMs, but they’re certainly catching up. Instead, open-source LLMs have already surpassed closed-source LLMs in other aspects, like reduced violation scores and quicker advancements from increased collaboration.

But what could the future of open-source LLMs hold?

One possibility is smaller, yet more efficient, language models. The LLaMA 1 model is an embodiment of this principle, by being able to achieve better performance than the GPT-3 LLM despite being a smaller model. It’s likely open-source models will continue to get smaller, but still reach comparable performance to much bigger models.

Another possibility is more niche-specific LLMs. Fine-tuning an open-source LLM for specific areas like healthcare, law, or finance is a big challenge currently, but we could very well see LLMs designed for these areas and still maintain high accuracy.

Regarding future breakthroughs, we could see models becoming adept at complex tasks they’ve never seen before. Currently, models can only carry out tasks they’ve been trained for, although we could see advancements in few-shot learning where models can adapt to new situations with only a handful of examples from a new task.

The possibilities for open-source LLMs are endless. Even though closed-source LLMs have the upper hand in some areas, like performance, the dynamic and collaborative nature of open-source LLMs suggests a future where they surpass their closed-source counterparts.

Ankur’s Newsletter

Discussion about this post