💌 Subscribe to my newsletter: Digital Innovation Review

👨‍💻 And add me on LinkedIn

HomeBlogsInsightsGuide to Open Source LLMs

Guide to Open Source LLMs

The ten best Open source LLMs, a complete collection of all the most popular open source language models and a guide to the related licenses for use.

When I discovered the power of large language models (LLMs), I was amazed. It seemed like the future had suddenly arrived. These LLMs, or fundamental models, learn almost everything you throw at them, becoming real catalysts for companies, not just simple tools. They make processes faster, more creative and more efficient, drastically reducing operating costs.

However, as interest in generative AI has exploded, the market has become saturated with LLMs, complicating the choice. Many are limited by expensive paywalls, while others, open-source, promise freedom and innovation, but with restrictions. Some visionaries have released open-source LLMs with commercial licenses, a golden opportunity for businesses.

Let's see which are the best 10 to consider for a business project:

  1. LLaMA 2 by Meta: Meta marked a major breakthrough with the release of LLaMA 2, an open-source large language model (LLM). With a capacity ranging from 7 to 70 billion parameters, LLaMA 2 stands out for its versatility in natural language and programming tasks. It has been refined with human feedback-based reinforcement learning techniques, making it a powerful and adaptable tool for chatbots and other AI applications.
  2. BLOOM: Launched in 2022 by Hugging Face, BLOOM represents a milestone for democratized generative AI. With 176 billion parameters, it supports 46 languages and 13 programming languages. Transparency is key to BLOOM, allowing everyone to access, run and improve source code and training data.
  3. BERT: Developed by Google in 2018, BERT is an innovative open-source LLM that has set new standards in natural language processing tasks. BERT, an acronym for Bidirectional Encoder Representations from Transformers, was one of the first experiments to explore the potential of transformer architectures and remains one of the most popular and widely used LLMs.
  4. Falcon 180B: The UAE Institute of Technology Innovation's Falcon 180B, launched in 2023, is an LLM trained on 180 billion parameters. This impressive model outperformed other LLMs such as LLaMA 2 and GPT-3.5 in several natural language processing tasks, positioning itself as a serious contender in the field of generative AI.
  5. OPT-175B: Part of Meta's suite of pre-trained transformer models, OPT-175B is an open-source LLM with 175 billion parameters. Launching in 2022, it offers similar performance to GPT-3, but is available for research purposes only, in line with Meta's commitment to open source in generative AI.
  6. XGen-7B from Salesforce: Salesforce entered the LLM race with the launch of XGen-7B in 2023. This model focuses on efficiency and supports longer context windows, allowing for more consistent and accurate text generation, despite having only 7 billion parameters .
  7. GPT-NeoX and GPT-J by EleutherAI: Developed by EleutherAI, GPT-NeoX and GPT-J are two open-source alternatives to GPT. With 20 billion parameters for GPT-NeoX and 6 billion for GPT-J, these models offer accurate results, although they fall short of more advanced LLMs in terms of size.
  8. Vicuna 13-B: Vicuna-13B is an open-source conversational model developed from LLaMa 13B and optimized for chatbots and other conversational AI applications. It was trained on user-shared conversations and showed comparable performance to ChatGPT and Google Bard, outperforming other models in many cases.
  9. Mistral 7B: Mistral AI, a French company, recently launched its new Large Language Model (LLM) called Mistral 7B. This innovative model, trained with 7.3 billion parameters, is now available globally to all developers. Thanks to the Apache 2.0 license, Mistral 7B is freely accessible, allowing anyone to integrate it into their applications. The project, which is not the result of improvisation but of mature and well-structured work, saw the participation of the CINECA/EuroHPC consortium and the support of the Leonardo supercomputer. Mistral 7B stands out for its exceptional performance, surpassing models such as Meta's Llama 2 13B and Llama 1 34B. It uses advanced techniques such as Grouped-query attention (GQA) and Sliding Window Attention (SWA) to improve the efficiency of the inference process, resulting in a leading model for its size and performance. It is easily deployed on cloud platforms such as AWS, Google Cloud and Azure, and available for modification and adaptation under the Apache 2.0 license.
  10. GPT4All: GPT4All is an ecosystem designed to democratize access and use of powerful, custom language models by making them run on consumer-grade CPUs. The goal of GPT4All is to provide the best assistant language model with tuned instructions, accessible and usable by anyone, both companies and individuals. GPT4All models, ranging in size from 3GB to 8GB, can be downloaded and deployed in the GPT4All open-source software ecosystem, maintained and supported by Nomic AI. This ecosystem not only ensures quality and security, but also facilitates the training and deployment of customized Large Language Models, making them a valuable and accessible resource for a broad audience.

But wanting to have an even more complete overview of the most promising language models released in recent months we can rely on this ranking: it is created by https://chat.lmsys.org/ and is based on the following three benchmarks.

  • Chatbot Arena – a randomized, crowdsourced battle platform. We use votes from over 100,000 users to calculate Elo ratings.
  • MT-Bench – a series of challenging multi-round questions. We use GPT-4 to evaluate model responses.
  • MMLU (5 shots) – a test to measure the multitask accuracy of a model on 57 tasks.

Code 💻: Arena Elo ratings are calculated from this notebooks. MT-bench scores (single response rating on a scale of 10) are calculated from fastchat.llm_judge. MMLU scores are mostly calculated by InstructEval. Higher values are better for all benchmarks. Empty cells mean they are not available.

Model⭐ Arena Elo rating📈 MT-bench (score)MMLULicense
GPT-4-Turbo12179.32Proprietary
GPT-4-031412018.9686.4Proprietary
Claude-111537.977Proprietary
GPT-4-061311529.18Proprietary
Claude-2.011278.0678.5Proprietary
Claude-2.111188.18Proprietary
GPT-3.5-turbo-061311128.39Proprietary
Claude-instant-111097.8573.4Proprietary
GPT-3.5-turbo-031411057.9470Proprietary
Tulu-2-DPO-70B11057.89AI2 ImpACT Low-risk
Yi-34B-chat110273.5Yi License
WizardLM-70b-v1.010977.7163.7Llama 2 Community
Vicuna-33B10937.1259.2Non-commercial
Starling-lm-7b-alpha10838.0963.9CC-BY-NC-4.0
pplx-70b-online1080Proprietary
OpenChat-3.510777.8164.3Apache-2.0
OpenHermes-2.5-Mistral-7b1075Apache-2.0
GPT-3.5-Turbo-110610748.32Proprietary
Llama-2-70b-chat10696.8663Llama 2 Community
WizardLM-13b-v1.210537.252.7Llama 2 Community
Zephyr-7b-beta10457.3461.4MIT
MPT-30B-chat10396.3950.4CC-BY-NC-SA-4.0
Vicuna-13B10396.5755.8Llama 2 Community
QWen-Chat-14B10396.9666.5Qianwen LICENSE
Zephyr-7b-alpha10346.88MIT
CodeLlama-34B-instruct103253.7Llama 2 Community
falcon-180b-chat103168Falcon-180B TII License
Guanaco-33B10296.5357.6Non-commercial
Llama-2-13b-chat10276.6553.6Llama 2 Community
Mistral-7B-Instruct-v0.110186.8455.4Apache 2.0
pplx-7b-online1017Proprietary
Llama-2-7b-chat10096.2745.8Llama 2 Community
Vicuna-7B10026.1749.8Llama 2 Community
PaLM-Chat-Bison-00110006.4Proprietary
Koala-13B9665.3544.7Non-commercial
ChatGLM3-6B958Apache-2.0
GPT4All-13B-Snoozy9365.4143Non-commercial
MPT-7B-Chat9305.4232CC-BY-NC-SA-4.0
ChatGLM2-6B9244.9645.5Apache-2.0
RWKV-4-Raven-14B9243.9825.6Apache 2.0
Alpaca-13B9044.5348.1Non-commercial
OpenAssistant-Pythia-12B8964.3227Apache 2.0
ChatGLM-6B8824.536.1Non-commercial
FastChat-T5-3B8733.0447.7Apache 2.0
StableLM-Tuned-Alpha-7B8452.7524.4CC-BY-NC-SA-4.0
Dolly-V2-12B8223.2825.7MIT
LLaMA-13B8002.6147Non-commercial
WizardLM-30B7.0158.7Non-commercial
Vicuna-13B-16k6.9254.5Llama 2 Community
WizardLM-13B-v1.16.7650Non-commercial
Tulu-30B6.4358.1Non-commercial
Guanaco-65B6.4162.1Non-commercial
OpenAssistant-LLaMA-30B6.4156Non-commercial
WizardLM-13B-v1.06.3552.3Non-commercial
Vicuna-7B-16k6.2248.5Llama 2 Community
Baize-v2-13B5.7548.9Non-commercial
XGen-7B-8K-Inst5.5542.1Non-commercial
Nous-Hermes-13B5.5149.3Non-commercial
MPT-30B-Instruct5.2247.8CC-BY-SA 3.0
Falcon-40B-Instruct5.1754.7Apache 2.0
H2O-Oasst-OpenLLaMA-13B4.6342.8Apache 2.0
Last update: November 2023

Each model uses a user license, let's see the different types of open source licenses for large language models (LLM):

  1. Apache 2.0:
    • Permissive license that allows the use, modification and distribution of the software also for commercial purposes.
    • Requires you to disclose any changes you make to the software when you redistribute it.
    • Includes an explicit grant of patent rights from contributors to users, but requires you to provide attribution.
  2. MIT License:
    • Permissive license known for its simplicity.
    • Allows virtually any use of the software, including commercial use, as long as attribution is provided.
    • It does not specifically grant patent rights to you.
  3. CC BY-SA-4.0 (Creative Commons Attribution-ShareAlike 4.0):
    • It allows the use, sharing and processing of the material for any purpose, including commercial.
    • Any derivative works must be distributed under the same license.
  4. OpenRAIL-M v1:
    • Built specifically for AI models.
    • It allows commercial use, but includes stipulations regarding security and ethics.
  5. BSD Licenses:
    • BSD-2-Clause: Permits nearly unrestricted use, including redistribution and use in proprietary software, provided that the copyright notice is retained.
    • BSD-3-Clause: Similar to the 2-clause license, but with an additional clause that prevents the use of the licensor's name to promote products derived from the software without permission.
  6. MPL-2.0 (Mozilla Public License 2.0):
    • Weak copyleft license; allows you to integrate open source code into proprietary projects.
    • Any changes to the licensed software must remain under the MPL and be made public.
  7. Ms-PL (Microsoft Public License):
    • Permissive license specific to the Microsoft ecosystem.
    • Permits redistribution and use for any purpose, provided the original copyright notice is included.
  8. CC0 (Creative Commons Zero):
    • It is not strictly a software license but a dedication to the public domain.
    • The author waives all copyright and related rights, permitting others to use, modify and distribute the work for any purpose without restriction.
  9. Unlicensed:
    • License that dedicates the work to the public domain, waiving all copyright claims.
    • You grant absolute freedom to use, modify and distribute the work.

These are some of the most common licenses that allow commercial use. It is important to note that Creative Commons licenses such as CC BY-NC, CC BY-NC-SA, and CC BY-NC-ND place specific restrictions on commercial use of the content or software.

For companies, it is crucial to distinguish commercially usable open-source models based on their licenses. For example, Apache 2.0 and MIT are permissive licenses that allow broad use, including commercial use. Other licenses, such as CC BY-SA-4.0, OpenRAIL-M v1, and BSD, offer different degrees of freedom and restrictions. It is essential to understand these licenses to choose the right model for your business needs.

It is not enough just to consider the license of an LLM, but also to evaluate its specific capabilities. Some models are pre-trained, others have been finely tuned for specific tasks. I have published a ranking of the best open-source LLMs, considering their capabilities and ease of adoption.

In conclusion, although hundreds of LLMs exist, only a few are truly business-friendly. I hope this article has clarified the situation regarding the usability of LLMs from a business point of view. As technology progresses, I'm sure we will see more and more business-friendly LLMs.

The world is changing and those who are successful learn to innovate their own products along with their own processes and at people.

On me

subscribe to the newsletter

Every two weeks, an insight into the world of Digital Innovation

© 2024 Andrea Zurini