Overview

Large language models are the backbone of the AI, it’s brain if you will. If you’re interested in learning about transformers and how LLMs work, you can check out the articles as Hugging Face.

LLM models recommendations

Good models for Voxta are models that are good at roleplay, but not specialized in prose and third person style, while also having great logic to closely follow prompts in action inference and summarization.

The best ones for Voxta based on the benchmarks and our own testing are:

Those are very good too, though some of the added creativity could affect action inference:

You can download quantized versions of these models (GPTQ/EXL2 if you have enough RAM, or GGUF if you don’t) on the TheBloke’s HuggingFace.

GPTQ, GGUF, AWQ, GGML?

Quantization is how you can run large models on your graphics card at home. It reduces the accuracy, but it makes it possible to use it on commodity hardware. Still, most require large amounts of VRAM. Use GPTQ if you have enough VRAM, otherwise use GGUF. GGML is an obsolete format and AWQ at the time of this writing is not fully supported yet.

7B, 13B, 20B, 30B, 70B

The larger the size, the stronger the model, but also the more VRAM it requires. 13B is usually a good compromise, depending on your hardware.

Large Language Models (Downloads & Evaluations)

Ayumi ERP Rating Archive is a good source of models, at least for text generation.
Chatbot Arena is a crowdsourced open platform for LLM evals
HuggingFace Open LLM Leaderboard contains useful information about coherent models, useful for action inference and summarization.
TheBloke on HuggingFace to download quantized versions of models (GPTQ, GGUF, AWQ)