OpenAI/Dense

GPT-2 Medium 345M

chat
0.345B
Parameters
1K
Context length
6
Benchmarks
4
Quantizations
1.0M
HF downloads
Architecture
Dense
Released
2019-02-14
Layers
24
KV Heads
16
Head Dim
64
Family
gpt2

GPT-2 Medium

Model Details

Model Description: GPT-2 Medium is the 355M parameter version of GPT-2, a transformer-based language model created and released by OpenAI. The model is a pretrained model on English language using a causal language modeling (CLM) objective.

How to Get Started with the Model

Use the code below to get started with the model. You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='gpt2-medium')
>>> set_seed(42)
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

[{'generated_text': "Hello, I'm a language model, I'm a language. I'm a compiler, I'm a parser, I'm a server process. I"},
 {'generated_text': "Hello, I'm a language model, and I'd like to join an existing team. What can I do to get started?\n\nI'd"},
 {'generated_text': "Hello, I'm a language model, why does my code get created? Can't I just copy it? But why did my code get created when"},
 {'generated_text': "Hello, I'm a language model, a functional language...\n\nI'm a functional language. Is it hard? A little, yes. But"},
 {'generated_text': "Hello, I'm a language model, not an object model.\n\nIn a nutshell, I need to give me objects from which I can get"}]

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import GPT2Tokenizer, GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
model = GPT2Model.from_pretrained('gpt2-medium')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

and in TensorFlow:

from transformers import GPT2Tokenizer, TFGPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
model = TFGPT2Model.from_pretrained('gpt2-medium')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)

Uses

Direct Use

In their model card about GPT-2, OpenAI wrote:

The primary intended users of these models are AI researchers and practitioners.

We primarily imagine these language models will be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale generative language models.

Downstream Use

In their model card about GPT-2, OpenAI wrote:

Here are some secondary use cases we believe are likely:

  • Writing assistance: Grammar assistance, autocompletion (for normal prose or code)
  • Creative writing and art: exploring the generation of creative, fictional texts; aiding creation of poetry and other literary art.
  • Entertainment: Creation of games, chat bots, and amusing generations.

Misuse and Out-of-scope Use

In their model card about GPT-2, OpenAI wrote:

Because large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true.

Additionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)).

The training data used for this model has not been released as a dataset one can browse. We know it contains a lot of unfiltered content from the internet, which is far from neutral. Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='gpt2-medium')
>>> set_seed(42)
>>> generator("The man worked as a", max_length=10, num_return_sequences=5)

[{'generated_text': 'The man worked as a security guard in a military'},
 {'generated_text': 'The man worked as a salesman in Mexico and eventually'},
 {'generated_text': 'The man worked as a supervisor at the department for'},
 {'generated_text': 'The man worked as a cleaner for the same corporation'},
 {'generated_text': 'The man worked as a barman and was involved'}]

>>> set_seed(42)
>>> generator("The woman worked as a", max_length=10, num_return_sequences=5)

[{'generated_text': 'The woman worked as a social worker in a children'},
 {'generated_text': 'The woman worked as a marketing manager, and her'},
 {'generated_text': 'The woman worked as a customer service agent in a'},
 {'generated_text': 'The woman worked as a cleaner for the same corporation'},
 {'generated_text': 'The woman worked as a barista and was involved'}]

This bias will also affect all fine-tuned versions of this model. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Training

Training Data

The OpenAI team wanted to train this model on a corpus as large as possible. To build it, they scraped all the web pages from outbound links on Reddit which received at least 3 karma. Note that all Wikipedia pages were removed from this dataset, so the model was not trained on any part of Wikipedia. The resulting dataset (called WebText) weights 40GB of texts but has not been publicly released. You can find a list of the top 1,000 domains present in WebText here.

Training Procedure

The model is pretrained on a very large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it was trained to guess the next word in sentences.

...

Quantizations & VRAM

Q4_K_M4.5 bpw
0.5 GB
VRAM required
94%
Quality
Q6_K6.5 bpw
0.6 GB
VRAM required
97%
Quality
Q8_08 bpw
0.6 GB
VRAM required
100%
Quality
FP1616 bpw
1.0 GB
VRAM required
100%
Quality

Benchmarks (6)

IFEval22.1
MUSR6.2
BBH2.7
MMLU-PRO2.0
GPQA1.7
MATH0.8

GPUs that can run this model

At Q4_K_M quantization. Sorted by minimum VRAM.

Find the best GPU for GPT-2 Medium 345M

Build Hardware for GPT-2 Medium 345M