Large Language Models
- File size
- 9.4KB
- Lines of code
- 223
Large Language Models
Also known as LLMs.
Theory
- Tokenizer: splits strings into tokens
- Lemmatization: reverts tokens to their root forms
- Encoder: makes each token a vector via embedding, shifts weights to capture meaning/context of tokens via attention
- Decoder: generates new text using encoded representations to predict next token in a sequence, can access current and previous encoder outputs
- Transformers: Encoder + Decoder
Quickstart
LLMs are typically built upon the following infrastructure.
- Text Generation Interface (TGI): framework that features optimized transformer code, quantization, accelerated weight loading and logits warping
- Hugging Face Transformers (HF): open-source library that provides many pre-trained models for NLP and other custom tasks
- Versatile Large Language Model (vLLM): framework that features efficient memory management with paged attention, optimized CUDA kernels, decoding algorithms and high-performance serving throughput (significantly outperforming TGI and HF)
You can train your own model or use existing ones.
Train your own model
First, learn Python.
Optionally, learn R, Julia, C++, Scala and Go.
Then choose a library from below.
Libraries
Python
R
Julia
C++
Scala
Go
Language-agnostic
Use a prebuilt model
In 2024, there are many existing LLM implementations to choose from. The more prominent ones have been listed below.
- GPT-4
- OpenAI-developed
- excellent at generating human-like text
- Bidirectional Encoder Representations from Transformers (BERT)
- Google-developed
- transformer-based model for a variety of NLP tasks
- Megatron-LM
- NVIDIA-developed
- scalable LLM framework for training and deploying models
- Llama
- Meta-developed
- family of large language models for NLP and text generation
- Fairseq
- Meta-developed (Facebook AI Research (FAIR))
- sequence-to-sequence learning toolkit for training and deploying models
- used for translation, summarization and language modeling
- AllenNLP
- Allen Institute for AI-developed
- open-source library built on PyTorch
- used for NLP research and deploying NLP-focused models
- Text-To-Text Transfer Transformer (T5)
- Google-developed
- highly versatile model that frames all NLP tasks as converting text to text
- Enhanced Representation through Knowledge Integration (ERNIE)
- Baidu-developed
- incorporates knowledge graphs into the pre-training process for enhanced language comprehension
- Universal Language Model Fine-tuning (ULMFiT)
- fast.ai-developed
- technique for fine-tuning pre-trained LLMs on downstream tasks
- StarCoder
- BigCode-developed
- optimized for code generation and completion
- BLOOM
- BigScience-developed
- multilingual language model trained on many languages and tasks
- GPT-NeoX
- EleutherAI-developed
- designed to replicate GPT-3 architecture for various NLP tasks
- Pythia
- EleutherAI-developed
- family of models for large-scale NLP tasks
- OpenAssistant
- LAION-developed
- conversational assistant for interactive AI dialogue capabilities
- Dolly V2
- Databricks-developed
- high-performance commercial model for instruction-following tasks
- StableLM
- Stability AI-developed
- robust model for NLP tasks
- LocalAI
- LocalAI-developed
- facilitates local deployment of existing LLMs without relying on cloud-based services
More on
- Ollama
- huggingface.co
- MachineLearning.md
- Open LLMs Github repository
- Awesome-LLM Github repository
- TGI vs vLLM by Rohit Kewalramani
- Which is faster, vLLM, TGI or TensorRT? Reddit post