Large Language Models

File size: 9.4KB
Lines of code: 223

`Large Language Models`

Also known as LLMs.

Theory

Tokenizer: splits strings into tokens
Lemmatization: reverts tokens to their root forms
Encoder: makes each token a vector via embedding, shifts weights to capture meaning/context of tokens via attention
Decoder: generates new text using encoded representations to predict next token in a sequence, can access current and previous encoder outputs
Transformers: Encoder + Decoder

Quickstart

LLMs are typically built upon the following infrastructure.

Text Generation Interface (TGI): framework that features optimized transformer code, quantization, accelerated weight loading and logits warping
Hugging Face Transformers (HF): open-source library that provides many pre-trained models for NLP and other custom tasks
Versatile Large Language Model (vLLM): framework that features efficient memory management with paged attention, optimized CUDA kernels, decoding algorithms and high-performance serving throughput (significantly outperforming TGI and HF)

You can train your own model or use existing ones.

Train your own model

First, learn Python.

Optionally, learn R, Julia, C++, Scala and Go.

Then choose a library from below.

Libraries

Python

R

Julia

Flux
MLJ

C++

Scala

Go

Language-agnostic

Use a prebuilt model

In 2024, there are many existing LLM implementations to choose from. The more prominent ones have been listed below.

GPT-4
- OpenAI-developed
- excellent at generating human-like text
Bidirectional Encoder Representations from Transformers (BERT)
- Google-developed
- transformer-based model for a variety of NLP tasks
Megatron-LM
- NVIDIA-developed
- scalable LLM framework for training and deploying models
Llama
- Meta-developed
- family of large language models for NLP and text generation
Fairseq
- Meta-developed (Facebook AI Research (FAIR))
- sequence-to-sequence learning toolkit for training and deploying models
- used for translation, summarization and language modeling
AllenNLP
- Allen Institute for AI-developed
- open-source library built on PyTorch
- used for NLP research and deploying NLP-focused models
Text-To-Text Transfer Transformer (T5)
- Google-developed
- highly versatile model that frames all NLP tasks as converting text to text
Enhanced Representation through Knowledge Integration (ERNIE)
- Baidu-developed
- incorporates knowledge graphs into the pre-training process for enhanced language comprehension
Universal Language Model Fine-tuning (ULMFiT)
- fast.ai-developed
- technique for fine-tuning pre-trained LLMs on downstream tasks
StarCoder
- BigCode-developed
- optimized for code generation and completion
BLOOM
- BigScience-developed
- multilingual language model trained on many languages and tasks
GPT-NeoX
- EleutherAI-developed
- designed to replicate GPT-3 architecture for various NLP tasks
Pythia
- EleutherAI-developed
- family of models for large-scale NLP tasks
OpenAssistant
- LAION-developed
- conversational assistant for interactive AI dialogue capabilities
Dolly V2
- Databricks-developed
- high-performance commercial model for instruction-following tasks
StableLM
- Stability AI-developed
- robust model for NLP tasks
LocalAI
- LocalAI-developed
- facilitates local deployment of existing LLMs without relying on cloud-based services

Large Language Models

Large Language Models

Theory

Quickstart

Train your own model

Libraries

Python

R

Julia

C++

Scala

Go

Language-agnostic

Use a prebuilt model

More on

`Large Language Models`