Large Language Models

File size
9.4KB
Lines of code
223

Large Language Models

Also known as LLMs.

Theory

  • Tokenizer: splits strings into tokens
  • Lemmatization: reverts tokens to their root forms
  • Encoder: makes each token a vector via embedding, shifts weights to capture meaning/context of tokens via attention
  • Decoder: generates new text using encoded representations to predict next token in a sequence, can access current and previous encoder outputs
  • Transformers: Encoder + Decoder

Quickstart

LLMs are typically built upon the following infrastructure.

  1. Text Generation Interface (TGI): framework that features optimized transformer code, quantization, accelerated weight loading and logits warping
  2. Hugging Face Transformers (HF): open-source library that provides many pre-trained models for NLP and other custom tasks
  3. Versatile Large Language Model (vLLM): framework that features efficient memory management with paged attention, optimized CUDA kernels, decoding algorithms and high-performance serving throughput (significantly outperforming TGI and HF)

You can train your own model or use existing ones.

Train your own model

First, learn Python.

Optionally, learn R, Julia, C++, Scala and Go.

Then choose a library from below.

Libraries

Python

R

Julia

C++

Scala

Go

Language-agnostic

Use a prebuilt model

In 2024, there are many existing LLM implementations to choose from. The more prominent ones have been listed below.

  • GPT-4
    • OpenAI-developed
    • excellent at generating human-like text
  • Bidirectional Encoder Representations from Transformers (BERT)
    • Google-developed
    • transformer-based model for a variety of NLP tasks
  • Megatron-LM
    • NVIDIA-developed
    • scalable LLM framework for training and deploying models
  • Llama
    • Meta-developed
    • family of large language models for NLP and text generation
  • Fairseq
    • Meta-developed (Facebook AI Research (FAIR))
    • sequence-to-sequence learning toolkit for training and deploying models
    • used for translation, summarization and language modeling
  • AllenNLP
    • Allen Institute for AI-developed
    • open-source library built on PyTorch
    • used for NLP research and deploying NLP-focused models
  • Text-To-Text Transfer Transformer (T5)
    • Google-developed
    • highly versatile model that frames all NLP tasks as converting text to text
  • Enhanced Representation through Knowledge Integration (ERNIE)
    • Baidu-developed
    • incorporates knowledge graphs into the pre-training process for enhanced language comprehension
  • Universal Language Model Fine-tuning (ULMFiT)
    • fast.ai-developed
    • technique for fine-tuning pre-trained LLMs on downstream tasks
  • StarCoder
    • BigCode-developed
    • optimized for code generation and completion
  • BLOOM
    • BigScience-developed
    • multilingual language model trained on many languages and tasks
  • GPT-NeoX
    • EleutherAI-developed
    • designed to replicate GPT-3 architecture for various NLP tasks
  • Pythia
    • EleutherAI-developed
    • family of models for large-scale NLP tasks
  • OpenAssistant
    • LAION-developed
    • conversational assistant for interactive AI dialogue capabilities
  • Dolly V2
    • Databricks-developed
    • high-performance commercial model for instruction-following tasks
  • StableLM
    • Stability AI-developed
    • robust model for NLP tasks
  • LocalAI
    • LocalAI-developed
    • facilitates local deployment of existing LLMs without relying on cloud-based services

More on