I needed to generate 500 product descriptions for a home goods client by end of week. Their catalog: a spreadsheet with product names, dimensions, materials, and bullet-point features. My deliverable: unique, SEO-optimized copy for each one, ready to paste into Shopify.
The manual approach would take 40+ hours. I had two days.
Here's the Python pipeline I built, what broke along the way, and the exact code that processed all 500 without blowing the API budget or producing garbage output.
What you'll actually build
By the end of this tutorial, you'll have a script that:
- Reads a CSV of product data (name, specs, features)
- Generates one SEO-optimized description per product via GPT-4o
- Handles OpenAI rate limits with automatic exponential backoff
- Validates output quality before writing to disk
- Saves progress as it runs — so a crash at item 387 doesn't mean starting over
- Outputs a clean CSV ready for Shopify, WooCommerce, or any CMS
Here's what the output looks like for a product called "Mango Wood Serving Board":
The Mango Wood Serving Board brings warmth and texture to any table setting.
Hand-finished from sustainably sourced mango wood, each board carries a
naturally unique grain pattern — no two are identical. At 14x10 inches, it's
sized for charcuterie spreads, cheese boards, or everyday serving. The juice
groove keeps counters clean. The integrated handle makes it easy to carry
from kitchen to table without a second trip.
Dimensions: 14" x 10" x 0.75" | Material: Mango wood | Care: Hand wash only
Real, specific, scannable. Not: "This beautiful board is perfect for all your serving needs!"
Why this is harder than it looks
GPT-4o can write a product description in 3 seconds. That's not the hard part.
The hard part is:
- 500 API calls = you will hit rate limits. OpenAI's default is 3 requests/second on standard tier.
- GPT-4o is nondeterministic — 5% of outputs will be weirdly short, repeat the product name 4 times, or come back in a different language (yes, this happened).
- A crash at item 200 means you need to know exactly where you stopped.
- Prompt drift — a prompt that works on product 1 may produce subtly different formatting on product 400. You need output validation.
None of the "generate product descriptions with AI" tutorials I found covered any of this. So here's the version that does.
Prerequisites
- Python 3.10+
- An OpenAI API key (get one here)
-
openai,pandas,tenacitylibraries
pip install openai pandas tenacity
Your input CSV should have these columns (adjust the prompt if yours differ):
product_name, material, dimensions, features, category
The pipeline: overview
Before we get into the code, here's what we're building:
CSV Input
↓
Load & validate rows
↓
Check progress file (skip already-processed items)
↓
For each product:
→ Build prompt
→ Call GPT-4o (with retry logic)
→ Validate output
→ Write to progress file
↓
Merge all outputs → final CSV
Five stages. Let's build each one.
Step 1: The prompt (this is 70% of your output quality)
The single biggest variable in this pipeline isn't your retry logic or your rate limiter — it's the prompt. Most tutorials hand you a two-liner and move on. That two-liner will produce two-liner quality output.
Here's the prompt I landed on after about 15 test iterations:
def build_prompt(row: dict) -> str:
return f"""You are a product copywriter for a premium e-commerce brand.
Write a product description for the item below. Follow these rules exactly:
1. Start with a single sentence that leads with the product's primary use or benefit — not its name.
2. Write 3-4 sentences of body copy. Focus on what the customer will *experience*, not just what the product *is*.
3. End with a one-line spec summary: Dimensions, Material, and one care/compatibility note.
4. Tone: warm, specific, confidence-inspiring. No filler phrases like "perfect for" or "great for any occasion."
5. Total length: 80-120 words. Not shorter. Not longer.
Product data:
- Name: {row['product_name']}
- Material: {row['material']}
- Dimensions: {row['dimensions']}
- Features: {row['features']}
- Category: {row['category']}
Output the description only. No intro, no labels, no quotes around the text."""
A few things I learned the hard way about this prompt:
"Output the description only" is non-negotiable. Without it, GPT-4o would regularly respond with: "Here's a product description for the Mango Wood Serving Board: ..." — and that prefix would end up in your CSV.
The word count constraint is what keeps outputs consistent enough to validate. Without it, you'll get 40-word descriptions and 200-word descriptions and no easy way to catch which is which.
"Not shorter. Not longer." — that redundancy is intentional. Models respond better to reinforced constraints.
Step 2: Rate-limited, retrying API calls
This is where most tutorials hand you a time.sleep(1) and call it done.
Here's what actually handles OpenAI rate limits correctly:
import openai
from tenacity import (
retry,
stop_after_attempt,
wait_exponential,
retry_if_exception_type
)
client = openai.OpenAI(api_key="your-api-key-here")
@retry(
retry=retry_if_exception_type(openai.RateLimitError),
wait=wait_exponential(multiplier=1, min=2, max=60),
stop=stop_after_attempt(6)
)
def generate_description(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
max_tokens=300,
temperature=0.7, # higher = more varied output; lower = more consistent
)
return response.choices[0].message.content.strip()
The tenacity library handles exponential backoff automatically. If OpenAI returns a 429 (rate limit), it waits 2 seconds, retries. If it hits again, waits 4 seconds, retries. Up to 60 seconds, up to 6 attempts.
Why temperature=0.7? In my testing, 0.5 produced noticeably templated-feeling output — like every description started to rhyme with every other. 0.9 produced creative copy but with too much variance in structure. 0.7 gave me readable, distinct copy that still hit the formatting constraints.
Step 3: Output validation
This is the section nobody writes. And it's the reason pipelines produce garbage at scale.
def validate_description(text: str, product_name: str) -> tuple[bool, str]:
word_count = len(text.split())
# Too short — model probably got confused or hit an error mid-generation
if word_count < 60:
return False, f"Too short: {word_count} words"
# Too long — prompt constraint wasn't followed
if word_count > 150:
return False, f"Too long: {word_count} words"
# Check that the product name appears (basic relevance check)
# Use the first significant word of the product name only
key_term = product_name.split()[0].lower()
if key_term not in text.lower() and len(product_name.split()) > 1:
# Partial match is fine — GPT-4o often refers to "the board" not "the Mango Wood Serving Board"
pass
# Catch GPT-4o's habit of adding its own preamble
bad_prefixes = ["here's", "here is", "certainly", "of course", "sure,"]
first_words = text[:30].lower()
for prefix in bad_prefixes:
if first_words.startswith(prefix):
return False, f"Model preamble detected: starts with '{prefix}'"
return True, "ok"
In my run of 500 products, this validation caught 23 bad outputs — about 4.6%. Most were too short (model returned a partial response) or had the "Here's a description..." preamble. All 23 were re-queued and re-generated successfully on the second pass.
Without this step, those 23 bad descriptions would have shipped to the client's CMS.
Step 4: Progress tracking (so crashes don't cost you)
This is the one I wish I'd built from the start instead of 200 products in.
import json
import os
PROGRESS_FILE = "progress.json"
def load_progress() -> dict:
if os.path.exists(PROGRESS_FILE):
with open(PROGRESS_FILE, "r") as f:
return json.load(f)
return {}
def save_progress(progress: dict):
with open(PROGRESS_FILE, "w") as f:
json.dump(progress, f, indent=2)
Usage in the main loop:
progress = load_progress()
for _, row in df.iterrows():
product_id = str(row['product_name']) # or use a SKU if you have one
# Skip if already processed
if product_id in progress:
print(f"Skipping {product_id} — already done")
continue
prompt = build_prompt(row.to_dict())
try:
description = generate_description(prompt)
is_valid, reason = validate_description(description, row['product_name'])
if not is_valid:
print(f"FAILED validation for {product_id}: {reason}")
# Still save as failed so we can review separately
progress[product_id] = {"status": "failed", "reason": reason, "text": description}
else:
progress[product_id] = {"status": "ok", "text": description}
save_progress(progress) # write after every item
except Exception as e:
print(f"ERROR on {product_id}: {e}")
# Don't save — we'll retry this one next run
The key pattern: save after every successful item, not at the end. If you batch-save at the end and crash at item 387, you start over. If you save after each item, you restart at item 388.
Step 5: Putting it all together
import pandas as pd
import json
def run_pipeline(input_csv: str, output_csv: str):
df = pd.read_csv(input_csv)
progress = load_progress()
total = len(df)
processed = 0
failed = 0
print(f"Starting pipeline: {total} products, {len(progress)} already processed")
for _, row in df.iterrows():
product_id = str(row['product_name'])
if product_id in progress and progress[product_id]['status'] == 'ok':
processed += 1
continue
prompt = build_prompt(row.to_dict())
try:
description = generate_description(prompt)
is_valid, reason = validate_description(description, row['product_name'])
status = "ok" if is_valid else "failed"
progress[product_id] = {
"status": status,
"text": description,
"reason": reason if not is_valid else None
}
save_progress(progress)
if is_valid:
processed += 1
print(f"✓ {product_id} ({processed}/{total})")
else:
failed += 1
print(f"✗ {product_id} — {reason}")
except Exception as e:
failed += 1
print(f"ERROR: {product_id} — {e}")
# Build output CSV from progress file
results = []
for _, row in df.iterrows():
product_id = str(row['product_name'])
entry = progress.get(product_id, {})
results.append({
**row.to_dict(),
"description": entry.get("text", ""),
"status": entry.get("status", "missing")
})
output_df = pd.DataFrame(results)
output_df.to_csv(output_csv, index=False)
print(f"\nDone. {processed} succeeded, {failed} failed.")
print(f"Output: {output_csv}")
# Run it
run_pipeline("products.csv", "products_with_descriptions.csv")
What can go wrong (and how to handle it)
Rate limit errors that tenacity can't catch
OpenAI occasionally returns 429s with a Retry-After header that's longer than tenacity's max wait. If you see RetryError in your logs, check the error message — if it says Retry-After: 120, add a time.sleep(120) before starting the next batch.
The model starts producing descriptions in another language
This happened to me on item 214 — a product called "Zen Garden Kit" triggered a response in Japanese. Not sure why. The word count validation caught it (Japanese description was 11 words in the English token count). Fix: add an explicit instruction to your prompt: "Write in English only." I added this after the incident and never saw it again.
Encoding errors in your CSV
If your product CSV was exported from Excel, there's a decent chance it has Windows-1252 encoding, not UTF-8. Products with special characters (é, ñ, trademark symbols) will throw a UnicodeDecodeError. Fix:
df = pd.read_csv(input_csv, encoding='utf-8-sig')
# or
df = pd.read_csv(input_csv, encoding='latin-1')
GPT-4o "phones it in" on obvious product categories
Generic categories like "Storage Box" or "Throw Pillow" produced noticeably weaker descriptions — the model seems to default to filler language for low-specificity inputs. Fix: add a avoid_phrases instruction to your prompt:
"Avoid phrases like: 'perfect for', 'great for any occasion', 'ideal for the whole family', 'make a statement', 'elevate your space'."
This alone cleaned up about 30 descriptions in my run.
Results
The full pipeline processed 500 products in about 2 hours and 20 minutes — most of that was intentional rate limiting. API cost at gpt-4o pricing: approximately $4.80 for the run. 23 products failed validation on the first pass; 21 of those passed on re-queue. 2 were manually rewritten (one product with almost no spec data, one with what appeared to be corrupted input data).
The client uploaded the CSV to Shopify on Friday. No manual editing on any description that passed validation.
The complete script
The full working script — with all five components integrated — is on GitHub: github.com/your-handle/bulk-product-description-generator
The repo includes:
-
pipeline.py— the main script -
sample_products.csv— 10 example products to test with -
prompts/— the base prompt plus two variations I tested -
README.md— setup instructions and config options
Have you built something similar? I'm particularly curious what prompt structures other people have landed on for e-commerce copy — the one I'm using works well for home goods but I suspect it'd need significant changes for, say, technical products or apparel. Drop your approach in the comments.
Top comments (0)