Large Language Models

File size
11.3KB
Rendered lines
235

Large Language Models

Also known as LLMs.

Theory

  • Tokenizer: splits strings into tokens
  • Lemmatization: reverts tokens to their root forms
  • Encoder: makes each token a vector via embedding, shifts weights to capture meaning/context of tokens via attention
  • Decoder: generates new text using encoded representations to predict next token in a sequence, can access current and previous encoder outputs
  • Transformers: Encoder + Decoder

Quickstart

LLMs are typically built upon the following infrastructure.

  1. Text Generation Interface (TGI): framework that features optimized transformer code, quantization, accelerated weight loading and logits warping
  2. Hugging Face Transformers (HF): open-source library that provides many pre-trained models for NLP and other custom tasks
  3. Versatile Large Language Model (vLLM): framework that features efficient memory management with paged attention, optimized CUDA kernels, decoding algorithms and high-performance serving throughput (significantly outperforming TGI and HF)

You can train your own model or use existing ones.

Train your own model

First, learn Python.

Optionally, learn R, Julia, C++, Scala and Go.

Then choose a library from below.

Libraries

Python

R

Julia

C++

Scala

Go

Language-agnostic

Use a prebuilt model

In 2024, there are many existing LLM implementations to choose from. The more prominent ones have been listed below.

  • GPT-4
  • OpenAI-developed
  • excellent at generating human-like text
  • Bidirectional Encoder Representations from Transformers (BERT)
  • Google-developed
  • transformer-based model for a variety of NLP tasks
  • Megatron-LM
  • NVIDIA-developed
  • scalable LLM framework for training and deploying models
  • Llama
  • Meta-developed
  • family of large language models for NLP and text generation
  • Fairseq
  • Meta-developed (Facebook AI Research (FAIR))
  • sequence-to-sequence learning toolkit for training and deploying models
  • used for translation, summarization and language modeling
  • AllenNLP
  • Allen Institute for AI-developed
  • open-source library built on PyTorch
  • used for NLP research and deploying NLP-focused models
  • Text-To-Text Transfer Transformer (T5)
  • Google-developed
  • highly versatile model that frames all NLP tasks as converting text to text
  • Enhanced Representation through Knowledge Integration (ERNIE)
  • Baidu-developed
  • incorporates knowledge graphs into the pre-training process for enhanced language comprehension
  • Universal Language Model Fine-tuning (ULMFiT)
  • fast.ai-developed
  • technique for fine-tuning pre-trained LLMs on downstream tasks
  • StarCoder
  • BigCode-developed
  • optimized for code generation and completion
  • BLOOM
  • BigScience-developed
  • multilingual language model trained on many languages and tasks
  • GPT-NeoX
  • EleutherAI-developed
  • designed to replicate GPT-3 architecture for various NLP tasks
  • Pythia
  • EleutherAI-developed
  • family of models for large-scale NLP tasks
  • OpenAssistant
  • LAION-developed
  • conversational assistant for interactive AI dialogue capabilities
  • Dolly V2
  • Databricks-developed
  • high-performance commercial model for instruction-following tasks
  • StableLM
  • Stability AI-developed
  • robust model for NLP tasks
  • LocalAI
  • LocalAI-developed
  • facilitates local deployment of existing LLMs without relying on cloud-based services

More on