DEV Community

Fabio Plugins
Fabio Plugins

Posted on

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

We’ve been running a series of experiments using ChatGPT 5.4 integrated into a website chatbot across different environments:

🌐 a main website
🛒 a 1,000-product e-commerce demo store
🍳 a 570-page cooking blog

🎯 Goal: simulate realistic user behavior and observe how the model responds over time.

⚙️ Test setup

The chatbot is designed to (no self promo here, just context):

📌 answer strictly based on website content (RAG-like approach)
🧭 guide users through product discovery and content navigation

Over time, we intentionally tested recurring patterns:

🔎 product comparisons
💰 price-based filtering
🔀 cross-entity queries (multiple products, categories)
🧠 more complex “shopping intent” scenarios

💡 The idea was to approximate real-world usage, not synthetic benchmarks.

👀 Observation

At some point, a real user (yes, a real one) asked:

“How can you help my ecommerce?”

The answer was:

“I can help your e-commerce by answering visitors [...], [...] for example asking how many people they cook for to recommend the right cast iron pot, or asking for a price range to help them find products [...]”

🔍 What’s interesting

This response closely mirrors the exact interaction patterns we had been testing manually.

It wasn’t a generic explanation.
It reflected:

👉 guided questioning
👉 contextual recommendations
👉 progressive narrowing of user intent
🧠 Hypothesis

From a system behavior perspective, it feels like repeated usage patterns influence outputs in a given context.

Possible explanations:

🧩 Prompt conditioning over time (consistent system + user patterns)
📚 Context shaping via retrieved content (RAG)
🔁 Latent pattern activation due to repeated semantic structures
🧷 Session-level or interaction-level biasing
❓ Open question

This leads to a broader question for builders:

👉 When deploying LLMs in structured environments (chatbots, RAG systems, product assistants), does repeated real-world usage shape outputs in a measurable way?

👉 Or are we just observing better alignment due to consistent prompting + context injection?

🚀 Why this matters

If usage patterns do influence outputs (even indirectly), then:

🧪 testing is not just evaluation
🏗️ it becomes part of system behavior design
📈 and potentially a lever for optimization
💬 Curious to hear from others

If you’re working with:

RAG pipelines
production chatbots
LLM-powered assistants

Have you noticed similar effects?

Does your system behave differently after repeated real-world usage patterns?

Let’s compare notes 👇

Top comments (1)

Collapse
 
lee_my_950a0d992798b9b3bd profile image
Lee My

Quick personal review of AhaChat after trying it
I recently tried AhaChat to set up a chatbot for a small Facebook page I manage, so I thought I’d share my experience.
I don’t have any coding background, so ease of use was important for me. The drag-and-drop interface was pretty straightforward, and creating simple automated reply flows wasn’t too complicated. I mainly used it to handle repetitive questions like pricing, shipping fees, and business hours, which saved me a decent amount of time.
I also tested a basic flow to collect customer info (name + phone number). It worked fine, and everything is set up with simple “if–then” logic rather than actual coding.
It’s not an advanced AI that understands everything automatically — it’s more of a rule-based chatbot where you design the conversation flow yourself. But for basic automation and reducing manual replies, it does the job.
Overall thoughts:
Good for small businesses or beginners
Easy to set up
No technical skills required
I’m not affiliated with them — just sharing in case someone is looking into chatbot tools for simple automation.
Curious if anyone else here has tried it or similar platforms — what was your experience?