Go to file
wyw 1356c5dd65 Update README.md 2023-05-15 07:41:52 +08:00
LICENSE Initial commit 2023-05-12 11:40:49 +08:00
README.md Update README.md 2023-05-15 07:41:52 +08:00

README.md

人工智能大模型汇总

关于LLM的里程碑论文列表

大型语言模型(LLM)已经席卷了NLP社区和人工智能社区。下面是一个关于大型语言模型的里程碑式论文列表

日期 关键词 组织 文章 出版
2017-06 Transformers Google Attention Is All You Need NeurIPS
2018-06 GPT 1.0 OpenAI Improving Language Understanding by Generative Pre-Training
2018-10 BERT Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding NAACL
2019-02 GPT 2.0 OpenAI Language Models are Unsupervised Multitask Learners
2019-09 Megatron-LM NVIDIA Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
2019-10 T5 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer JMLR
2019-10 ZeRO Microsoft ZeRO: Memory Optimizations Toward Training Trillion Parameter Models SC
2020-01 Scaling Law OpenAI Scaling Laws for Neural Language Models
2020-05 GPT 3.0 OpenAI Language models are few-shot learners NeurIPS
2021-01 Switch Transformers Google Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity JMLR
2021-08 Codex OpenAI Evaluating Large Language Models Trained on Code
2021-08 Foundation Models Stanford On the Opportunities and Risks of Foundation Models
2021-09 FLAN Google Finetuned Language Models are Zero-Shot Learners ICLR
2021-10 T0 HuggingFace et al. Multitask Prompted Training Enables Zero-Shot Task Generalization ICLR
2021-12 GLaM Google GLaM: Efficient Scaling of Language Models with Mixture-of-Experts ICML
2021-12 WebGPT OpenAI WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing
2021-12 Retro DeepMind Improving language models by retrieving from trillions of tokens ICML
2021-12 Gopher DeepMind Scaling Language Models: Methods, Analysis & Insights from Training Gopher
2022-01 COT Google Chain-of-Thought Prompting Elicits Reasoning in Large Language Models NeurIPS
2022-01 LaMDA Google LaMDA: Language Models for Dialog Applications
2022-01 Minerva Google Solving Quantitative Reasoning Problems with Language Models NeurIPS
2022-01 Megatron-Turing NLG Microsoft&NVIDIA Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
2022-03 InstructGPT OpenAI Training language models to follow instructions with human feedback
2022-04 PaLM Google PaLM: Scaling Language Modeling with Pathways
2022-04 Chinchilla DeepMind An empirical analysis of compute-optimal large language model training NeurIPS
2022-05 OPT Meta OPT: Open Pre-trained Transformer Language Models
2022-05 UL2 Google Unifying Language Learning Paradigms
2022-06 Emergent Abilities Google Emergent Abilities of Large Language Models TMLR
2022-06 BIG-bench Google Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
2022-06 METALM Microsoft Language Models are General-Purpose Interfaces
2022-09 Sparrow DeepMind Improving alignment of dialogue agents via targeted human judgements
2022-10 Flan-T5/PaLM Google Scaling Instruction-Finetuned Language Models
2022-10 GLM-130B Tsinghua GLM-130B: An Open Bilingual Pre-trained Model ICLR
2022-11 HELM Stanford Holistic Evaluation of Language Models
2022-11 BLOOM BigScience BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
2022-11 Galactica Meta Galactica: A Large Language Model for Science
2022-12 OPT-IML Meta OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
2023-01 Flan 2022 Collection Google The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
2023-02 LLaMA Meta LLaMA: Open and Efficient Foundation Language Models
2023-02 Kosmos-1 Microsoft Language Is Not All You Need: Aligning Perception with Language Models
2023-03 PaLM-E Google PaLM-E: An Embodied Multimodal Language Model
2023-03 GPT 4 OpenAI GPT-4 Technical Report
2023-04 Pythia EleutherAI et al. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling ICML
2023-05 Dromedary CMU et al. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
2023-05 PaLM 2 Google PaLM 2 Technical Report

面向代码的开放LLM模型

Language Model Release Date Checkpoints Paper/Blog Params (B) Context Length Licence
SantaCoder TODO santacoder SantaCoder: don't reach for the stars! 1.1 2048 OpenRAIL-M v1
StarCoder TODO starcoder StarCoder: A State-of-the-Art LLM for Code, StarCoder: May the source be with you! 15 8192 OpenRAIL-M v1
StarChat Alpha TODO starchat-alpha Creating a Coding Assistant with StarCoder 16 8192 OpenRAIL-M v1
Replit Code TODO replit-code-v1-3b Training a SOTA Code LLM in 1 week and Quantifying the Vibes — with Reza Shabani of Replit 2.7 infinity? (ALiBi) CC BY-SA-4.0
CodeGen2 TODO codegen2 1B-16B CodeGen2: Lessons for Training LLMs on Programming and Natural Languages 1 - 16 2048 Apache 2.0

面向预训练的开放LLM数据集

Name Release Date Paper/Blog Dataset Tokens (T) License
starcoderdata 2023/05 StarCoder: A State-of-the-Art LLM for Code starcoderdata ? Apache 2.0
RedPajama 2023/04 RedPajama, a project to create leading open-source models, starts by reproducing LLaMA training dataset of over 1.2 trillion tokens RedPajama-Data 1.2 Apache 2.0

面向指令调优的开放LLM数据集

Name Release Date Paper/Blog Dataset Samples (K) License
MPT-7B-Instruct 2023/05 Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs dolly_hhrlhf 59 CC BY-SA-3.0
databricks-dolly-15k 2023/04 Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM databricks-dolly-15k 15 CC BY-SA-3.0
OIG (Open Instruction Generalist) 2023/03 THE OIG DATASET OIG 44,000 Apache 2.0

面向指对齐调优的开放LLM数据集

Name Release Date Paper/Blog Dataset Samples (K) License
OpenAssistant Conversations Dataset 2023/04 OpenAssistant Conversations - Democratizing Large Language Model Alignment oasst1 161 Apache 2.0

开放LLMs的评估工具