Akhilesh

Posted on May 16

83. HuggingFace: Your Library for Every Pretrained Model

#ai #python #beginners #productivity

You have spent the last five posts building transformers, training BERT-like models, implementing GPT from scratch.

You understand what is happening inside.

Now let me show you how practitioners actually work.

from transformers import pipeline
classifier = pipeline("sentiment-analysis")
print(classifier("This library is incredible!"))

That is it. Three lines. State-of-the-art sentiment analysis. No model architecture. No training loop. No tokenizer code. Download a pretrained model, wrap it in a pipeline, get predictions.

HuggingFace built the infrastructure that made practical NLP accessible. The Model Hub has over 500,000 models. The transformers library unified the API across every major architecture. The datasets library provides one-line access to thousands of benchmark datasets. The tokenizers library provides fast, battle-tested tokenization.

If Phase 8 posts 78-82 taught you how transformers work, this post teaches you how to actually use them.

The Model Hub

from transformers import (
    AutoTokenizer, AutoModel, AutoModelForSequenceClassification,
    AutoModelForCausalLM, AutoModelForSeq2SeqLM,
    pipeline, Trainer, TrainingArguments
)
from datasets import load_dataset
import torch
import numpy as np
import warnings
warnings.filterwarnings("ignore")

print("HuggingFace ecosystem:")
print()

packages = {
    "transformers":  "Model architectures, tokenizers, training utilities",
    "datasets":      "50,000+ datasets with one-line loading",
    "tokenizers":    "Fast tokenizers in Rust, same API as transformers",
    "accelerate":    "Simple multi-GPU and mixed precision training",
    "evaluate":      "Standardized metrics for NLP tasks",
    "peft":          "Parameter-efficient fine-tuning (LoRA, prefix tuning)",
    "trl":           "RLHF training tools (PPO, DPO, SFT)",
}

for package, description in packages.items():
    print(f"  {package:<15}: {description}")

print()
print("Install everything:")
print("  pip install transformers datasets tokenizers accelerate evaluate peft")

The Auto Classes: One API for Every Model

print("Auto classes: load any model without knowing its architecture")
print()

model_examples = {
    "bert-base-uncased":               "BERT encoder, general purpose",
    "roberta-base":                    "Improved BERT, better on most tasks",
    "distilbert-base-uncased":         "BERT 60% smaller, 97% performance",
    "gpt2":                            "GPT-2, text generation",
    "facebook/opt-125m":               "Meta OPT, open GPT alternative",
    "t5-base":                         "T5, text-to-text, great for summarization",
    "google/flan-t5-base":             "Flan-T5, instruction-tuned T5",
    "sentence-transformers/all-MiniLM-L6-v2": "Sentence embeddings, fast",
}

for model_name, description in model_examples.items():
    print(f"  {model_name:<45}: {description}")

print()
print("Loading any of these:")
print("  tokenizer = AutoTokenizer.from_pretrained(model_name)")
print("  model     = AutoModel.from_pretrained(model_name)")
print()
print("Task-specific loading:")
print("  AutoModelForSequenceClassification  → text classification")
print("  AutoModelForTokenClassification     → NER, POS tagging")
print("  AutoModelForQuestionAnswering       → extractive QA")
print("  AutoModelForCausalLM                → text generation (GPT-style)")
print("  AutoModelForSeq2SeqLM               → translation, summarization (T5-style)")
print("  AutoModelForMaskedLM                → masked prediction (BERT-style)")

The Pipeline API

device = 0 if torch.cuda.is_available() else -1

print("HuggingFace Pipelines: one line for each NLP task")
print()

sentiment = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device=device
)

reviews = [
    "This product exceeded all my expectations!",
    "Terrible quality. Do not buy.",
    "It is okay, nothing special.",
    "Absolutely loved it. Will buy again.",
]

print("Sentiment Analysis:")
for text in reviews:
    result = sentiment(text)[0]
    print(f"  {result['label']:<10} {result['score']:.3f}  '{text}'")

ner = pipeline(
    "ner",
    model="dbmdz/bert-large-cased-finetuned-conll03-english",
    aggregation_strategy="simple",
    device=device
)

text = "Satya Nadella leads Microsoft, headquartered in Redmond, Washington."
entities = ner(text)
print(f"\nNamed Entity Recognition:")
print(f"Text: '{text}'")
for ent in entities:
    print(f"  {ent['entity_group']:<6} '{ent['word']}'  ({ent['score']:.3f})")

qa = pipeline(
    "question-answering",
    model="deepset/roberta-base-squad2",
    device=device
)

context = """
The transformer architecture was introduced in 2017 in the paper 
'Attention Is All You Need' by Vaswani et al. at Google Brain. 
It uses self-attention mechanisms instead of recurrence, making it 
highly parallelizable and enabling training on much larger datasets.
BERT was released in 2018 and GPT-3 in 2020.
"""

questions = [
    "When was the transformer introduced?",
    "Who wrote the Attention Is All You Need paper?",
    "What does the transformer use instead of recurrence?",
]

print(f"\nQuestion Answering:")
for question in questions:
    answer = qa(question=question, context=context)
    print(f"  Q: {question}")
    print(f"  A: '{answer['answer']}' (score={answer['score']:.3f})")
    print()

summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    device=device
)

long_text = """
Machine learning is a branch of artificial intelligence that enables systems to learn and 
improve from experience without being explicitly programmed. It focuses on developing 
computer programs that can access data and use it to learn for themselves. The process 
begins with observations or data, such as examples, direct experience, or instruction, 
so that computers can look for patterns in data and make better decisions in the future. 
The primary aim is to allow the computers to learn automatically without human intervention 
and adjust actions accordingly. Machine learning is closely related to deep learning, 
which uses neural networks with many layers to learn representations of data.
"""

summary = summarizer(long_text, max_length=80, min_length=30, do_sample=False)
print(f"Summarization:")
print(f"Original ({len(long_text.split())} words):")
print(f"  {long_text.strip()[:100]}...")
print(f"\nSummary ({len(summary[0]['summary_text'].split())} words):")
print(f"  {summary[0]['summary_text']}")

Loading Datasets

print("\nHuggingFace Datasets: one line to any benchmark")
print()

imdb = load_dataset("imdb", split="train[:1000]")
print(f"IMDB dataset (subset):")
print(f"  Size: {len(imdb)}")
print(f"  Features: {imdb.features}")
print(f"  Sample: {imdb[0]['text'][:80]}...")
print(f"  Label:  {imdb[0]['label']} (0=negative, 1=positive)")

print()
popular_datasets = {
    "imdb":            "Movie reviews, sentiment (25K)",
    "sst2":            "Stanford sentiment, single sentences",
    "squad":           "Stanford QA, reading comprehension",
    "glue":            "9 NLP benchmark tasks bundle",
    "wikitext":        "Wikipedia text for language modeling",
    "conll2003":       "Named entity recognition benchmark",
    "xsum":            "BBC news summarization",
    "opus_books":      "Book translation pairs, 16 languages",
    "common_voice":    "Mozilla speech dataset, 100+ languages",
    "imagenet-1k":     "1M images, 1K classes",
}

print("Popular HuggingFace datasets:")
for name, desc in popular_datasets.items():
    print(f"  {name:<25}: {desc}")

Fine-Tuning With the Trainer API

from transformers import TrainingArguments, Trainer
import evaluate

print("\nFine-tuning with HuggingFace Trainer API:")
print()

model_name    = "distilbert-base-uncased"
tokenizer_ft  = AutoTokenizer.from_pretrained(model_name)

dataset = load_dataset("imdb", split={"train": "train[:2000]", "test": "test[:500]"})

def tokenize(batch):
    return tokenizer_ft(batch["text"], truncation=True,
                         padding="max_length", max_length=128)

tokenized = dataset.map(tokenize, batched=True)
tokenized = tokenized.rename_column("label", "labels")
tokenized.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

model_ft = AutoModelForSequenceClassification.from_pretrained(
    model_name, num_labels=2)

accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions    = np.argmax(logits, axis=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

training_args = TrainingArguments(
    output_dir          = "./bert_imdb",
    num_train_epochs    = 3,
    per_device_train_batch_size = 16,
    per_device_eval_batch_size  = 32,
    warmup_steps        = 100,
    weight_decay        = 0.01,
    logging_dir         = "./logs",
    logging_steps       = 50,
    evaluation_strategy = "epoch",
    save_strategy       = "epoch",
    load_best_model_at_end = True,
    report_to           = "none",
)

trainer = Trainer(
    model           = model_ft,
    args            = training_args,
    train_dataset   = tokenized["train"],
    eval_dataset    = tokenized["test"],
    compute_metrics = compute_metrics,
)

print("Trainer configured. To run:")
print("  trainer.train()")
print("  trainer.evaluate()")
print()
print("The Trainer handles:")
print("  Batching, gradient accumulation, mixed precision")
print("  Logging, checkpointing, best model selection")
print("  Multi-GPU training with a single flag")
print()
print("Training arguments that matter most:")
print("  learning_rate:        2e-5 for fine-tuning (default is good)")
print("  num_train_epochs:     3 is usually enough for classification")
print("  per_device_batch_size: as large as GPU memory allows")
print("  warmup_ratio:         0.1 is standard (10% of steps)")
print("  weight_decay:         0.01 prevents overfitting")

Sentence Embeddings for Semantic Search

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

print("\nSentence Embeddings for Semantic Search:")
print("  pip install sentence-transformers")
print()

print("Loading sentence-transformers/all-MiniLM-L6-v2")
print("  22M parameters, 384-dim embeddings, very fast")
print()

sentences = [
    "How do neural networks learn?",
    "What is backpropagation?",
    "What is the capital of France?",
    "Paris is a beautiful city.",
    "Gradient descent optimizes model parameters.",
    "The Eiffel Tower is in Paris.",
]

query = "How does gradient descent work?"

print(f"Query: '{query}'")
print()
print("Usage:")
print("  model = SentenceTransformer('all-MiniLM-L6-v2')")
print("  embeddings = model.encode(sentences)")
print("  query_emb  = model.encode([query])")
print("  sims = cosine_similarity(query_emb, embeddings)")
print()
print("Expected ranking (most similar to query about gradient descent):")
expected_ranking = [
    ("Gradient descent optimizes model parameters.", 0.87),
    ("How do neural networks learn?",               0.72),
    ("What is backpropagation?",                    0.68),
    ("What is the capital of France?",              0.23),
    ("Paris is a beautiful city.",                  0.19),
    ("The Eiffel Tower is in Paris.",               0.15),
]
for sent, expected_sim in expected_ranking:
    print(f"  {expected_sim:.2f}  '{sent}'")

Choosing the Right Model

task_guide = {
    "Text classification": {
        "recommended":   ["distilbert-base-uncased", "bert-base-uncased", "roberta-base"],
        "notes":         "DistilBERT for speed, RoBERTa for accuracy"
    },
    "Named entity recognition": {
        "recommended":   ["dbmdz/bert-large-cased-finetuned-conll03-english",
                           "dslim/bert-base-NER"],
        "notes":         "Cased models work better for NER"
    },
    "Text generation": {
        "recommended":   ["gpt2", "facebook/opt-125m", "mistralai/Mistral-7B-Instruct-v0.1"],
        "notes":         "GPT-2 for learning, OPT for open research, Mistral for production"
    },
    "Summarization": {
        "recommended":   ["facebook/bart-large-cnn", "google/pegasus-xsum"],
        "notes":         "BART for news, Pegasus for extreme summarization"
    },
    "Translation": {
        "recommended":   ["Helsinki-NLP/opus-mt-en-fr", "facebook/mbart-large-50"],
        "notes":         "Helsinki models are fast, mBART handles 50 languages"
    },
    "Question answering": {
        "recommended":   ["deepset/roberta-base-squad2", "deepset/bert-large-uncased-whole-word-masking-squad2"],
        "notes":         "RoBERTa base is fast, BERT large is more accurate"
    },
    "Sentence embeddings": {
        "recommended":   ["sentence-transformers/all-MiniLM-L6-v2",
                           "sentence-transformers/all-mpnet-base-v2"],
        "notes":         "MiniLM for speed, mpnet for accuracy"
    },
    "Zero-shot classification": {
        "recommended":   ["facebook/bart-large-mnli", "typeform/distilbart-mnli-12-3"],
        "notes":         "No fine-tuning needed, classify any categories"
    },
}

print("Task → Model Guide:")
print()
for task, info in task_guide.items():
    print(f"  {task}:")
    for model in info["recommended"][:2]:
        print(f"    • {model}")
    print(f"    Note: {info['notes']}")
    print()

Sharing Your Fine-Tuned Model

print("Sharing your fine-tuned model on HuggingFace Hub:")
print()
print("  pip install huggingface_hub")
print()
print("  # Login")
print("  from huggingface_hub import login")
print("  login(token='your_token_from_huggingface.co/settings/tokens')")
print()
print("  # Push model after fine-tuning")
print("  trainer.push_to_hub('your_username/your_model_name')")
print()
print("  # Or push manually")
print("  model.push_to_hub('your_username/bert-finetuned-imdb')")
print("  tokenizer.push_to_hub('your_username/bert-finetuned-imdb')")
print()
print("  # Others can now use it immediately")
print("  model = AutoModel.from_pretrained('your_username/bert-finetuned-imdb')")
print()
print("Your model joins 500,000+ models on the Hub.")
print("Good models with READMEs get used by thousands of practitioners.")

A Resource Worth Reading

The HuggingFace course at huggingface.co/learn/nlp-course is the definitive free resource for learning the HuggingFace ecosystem. Eight chapters covering tokenizers, datasets, fine-tuning, sharing models, and advanced topics. Completely free, includes exercises, runs in Google Colab. One of the best-organized ML courses available anywhere. Search "HuggingFace NLP course."

Lewis Tunstall, Leandro von Werra, and Thomas Wolf (HuggingFace employees) wrote "Natural Language Processing with Transformers" (O'Reilly), which covers the full ecosystem with real-world examples and production considerations. The GitHub repository at github.com/nlp-with-transformers has all the notebooks free to run.

Try This

Create huggingface_practice.py.

Part 1: pipelines. Use five different HuggingFace pipelines on your own text: sentiment, NER, question answering, summarization, and zero-shot classification. For zero-shot, classify 5 sentences into categories you define yourself (no fine-tuning required).

Part 2: semantic search. Load sentence-transformers/all-MiniLM-L6-v2. Take 20 sentences from any domain. Given a query, rank all 20 by similarity. Plot the cosine similarity scores as a bar chart. Do the top results make intuitive sense?

Part 3: fine-tuning. Fine-tune distilbert-base-uncased on any binary classification dataset from HuggingFace Datasets. Use the Trainer API. Train for 2 epochs. Evaluate with accuracy and F1.

Part 4: push to Hub. Create a HuggingFace account (free). Fine-tune any model. Push it to the Hub with a proper model card explaining: what it does, what data it was trained on, example usage, performance metrics. Your first public contribution to open-source AI.

What's Next

You can load, fine-tune, and share models. But fine-tuning changes all parameters, which is expensive and requires significant GPU memory. The next post is about fine-tuning efficiently: LoRA adds tiny trainable adapters while keeping the base model frozen. You get 90% of full fine-tuning performance with 1% of the trainable parameters.

DEV Community