Voice-to-Quote in Construction: How AI Transforms BTP Estimating
When you're on a construction site with mud on your boots and a phone in one hand, the last thing you want to do is tap out a 50-line estimate on a tiny keyboard. Yet that's exactly what European BTP (bâtiment et travaux publics) workers do every day—because their software was designed for office desks, not scaffolding.
We've spent the last 18 months building voice-activated estimating at Anodos, a French SaaS for construction SMEs. This is the story of what works, what doesn't, and why voice AI in construction isn't just a gimmick—it's closing a decade-long gap between job site reality and digital tools.
The Problem: The 47-Minute Friction Loop
I interviewed 23 BTP artisans and MOE (maître d'œuvre) managers in Q4 2024. Here's what I found:
- Average estimate creation time: 47 minutes (from site inspection to PDF sent to client)
- Tool breakdown: 18 min writing + 12 min photo tagging + 10 min formatting + 7 min upload
- Error rate: 8% of first-draft estimates had quantity mistakes (corrected after client review)
- Device friction: 67% of artisans switched devices (site tablet → office desktop) to finalize estimates
That 47 minutes isn't per-contract—it's per-line-item category. A €50K renovation with 12 cost categories? You're looking at 8+ hours of spreadsheet and PDF work after the physical site visit.
The paradox: these same workers use voice-to-text for WhatsApp (16 times/day on average) and Dictaphone notes on their phones. But their estimating software hasn't caught up.
Why Voice-to-Quote Matters More Than You Think
Traditional voice-assistant use cases (Alexa, Siri) work because the domain is constrained: music, weather, directions. Construction estimation is unconstrained—the user might say:
- "12 square meters of plasterboard, 13mm thick, with sealant and primer"
- "3 linear meters of PVCU pipe, 32mm diameter, plus 8 couplings and 2 bends"
- "Demolition of internal load-bearing brick wall, 4.2m × 3.8m, safe disposal included"
Each statement encodes:
- A material type (plasterboard, pipe, brick)
- Quantity + unit (12 m², 3 LM, wall dimensions)
- Variants/options (thickness, diameter, treatments)
- Labor codes (sealant + primer = extra labor tier, demolition = specialized team)
If your AI model misses any of these, the quote is wrong. And a wrong quote to a client costs you the contract.
How We Built It (and What We Learned)
Approach 1: Generic LLM (GPT-4, Claude) — We tried this first in Feb 2024. We'd send voice-transcribed text to an API and ask: "Extract: material, quantity, unit, labor_code."
Result: 73% accuracy on common items (rebar, drywall), 34% accuracy on niche items (cavity insulation with special tape). Not safe for production.
Why it failed: LLMs are trained on web text, not BTP French jargon. "14mm fibre-gypsum" means something specific to a plasterer (fire-rated, structural). ChatGPT guesses.
Approach 2: Fine-tuned model + vector DB
We collected 3,000 real quotes (anonymized) from our pilot SMEs. Each line was annotated:
{
"audio_transcript": "douze mètres carrés de placo BA13 avec bande enduit",
"extracted": {
"material_code": "BA13",
"quantity": 12,
"unit": "m²",
"labor_multiplier": 1.2,
"material_supplier": "Placoplatre",
"price_reference": "€12.50/m²"
}
}
We fine-tuned mistral-7b (open-source, runs on-device for privacy) on this dataset. Then we paired it with a vector DB (Pinecone) of "known materials"—if the model's confidence was <85%, we'd query the DB for similar materials in the same category.
Result: 91% accuracy on the full dataset. On real user tests: 89% on first pass, 99% after user confirmation (1 swipe to correct).
Key insight: The last 10% of accuracy comes from humans. We built a 2-second correction UI: user hears the parsed quote, taps to adjust quantity/material, done. The model learns from the correction (federated learning, on-device).
Voice Quality: The Unspoken Challenge
Your phone's microphone is terrible on a construction site. We measured:
- Ambient noise: 75-85 dB (equivalent to a busy highway)
- Jackhammer nearby: 110+ dB (clipping, complete audio loss)
- Wind: Garbles fricatives (S sounds)
- Safety glasses resonance: Thin metallic echo (yes, really)
We solved this by:
- Pre-filtering: High-pass filter (200 Hz+) removes most machinery rumble
- Noise gate: Threshold at -40 dB silences dead air between phrases
- Dynamic compression: Prevents clipping on loud equipment spikes
- Voice activation keyword: User says "Anodos, note:" to start. Saves 35% of re-records because the system ignores background chatter.
We also shipped a headset integration (Jabra 8500 + Plantronics headsets). Accuracy jumps to 96% with a decent mic.
Real-World Metric: 15 Pilot SMEs, 8 Weeks
We deployed voice-to-quote to 15 BTP artisans in Lyon and Marseille (Aug–Oct 2024). Blind A/B test: voice vs. old spreadsheet workflow.
| Metric | Voice | Spreadsheet | Delta |
|---|---|---|---|
| Time/estimate | 8 min | 47 min | -83% |
| Errors (1st draft) | 2.1% | 8% | -74% |
| Client revision requests | 1.1 | 2.7 | -59% |
| Adoption (% using daily) | 87% | 100%* | *baseline |
| Training time (first 10 uses) | 12 min | 0 min | +12 min (one-time) |
The 8-minute figure includes: 2 min voice capture + 3 min UI review + 1 min PDF generation + 2 min upload to client.
Unexpected win: Fewer revision cycles. When artisans speak their estimate aloud, they catch mistakes before confirming (self-correction effect). When they type, they miss details.
The Scaling Problem: Accuracy as You Grow
Here's the hard truth: as you add more users and more regional materials, your model degrades.
- In Lyon, "carreau" = ceramic tile (standard)
- In Marseille, "carreau" = hydraulic tile (vintage, 2× price)
Same word, different material code, different labor multiplier.
We solved this by:
- Regional material libraries: Each region gets its own Pinecone index
- Supplier integration: When a user's company is tied to a specific supplier (e.g., local Placoplatre dealer), we weight their catalog in the vector search
- Federated learning: When a user corrects a material code, we update their regional model, not the global one (privacy + faster adaptation)
After 6 weeks of regional fine-tuning, accuracy stabilized at 93% in new regions.
What Voice-to-Quote Doesn't Do (Yet)
- Complex formulas: "Add 15% for scaffolding + 8% for waste" — LLMs struggle with nested math. You need a separate rule engine.
- Photo extraction: We thought vision AI could extract quantities from site photos (e.g., count bricks in a wall). Reality: lighting, angle, and overlaps make this <60% accurate. Humans are still faster.
- Regulatory compliance: French building codes change yearly. The model doesn't auto-update. We update the training set quarterly.
The Business Angle
We price voice-to-quote as a +€20/month feature (included in our €49-€99/month BTP packages).
- Adoption: 64% of trial users activated it in week 2
- Churn reduction: SMEs using voice have 3.2% monthly churn vs. 7.1% non-users (p=0.04, significant)
- COGS: LLM API calls cost us ~€0.08 per estimate. At €20/month, users can generate 250+ voice estimates before we break even (most generate 20–40/month).
It's not a standalone business. But as a feature that locks in SME users for 6+ months? It works.
Lessons for Builders
If you're building voice in construction, construction or any field-first industry:
- Domain matters: Generic LLMs fail. Fine-tune on your data.
- The last 10% requires humans: Embrace the hybrid loop (AI suggests, human corrects).
- Audio quality is your bottleneck, not language models: Invest in pre-filtering and cheap headsets.
- Regional specificity kills generalization: Plan for federated learning from day one.
- Voice is not about speed alone: It's about context capture. Workers speak their assumptions aloud, which surfaces errors early.
Olivier Ebrahim, founder of Anodos
We're building tools for BTP SMEs who work on site, not in Excel. Voice-to-quote is just the beginning.
Top comments (0)