🚗 From Chaos to Structure: Extracting Car Listings with AI
Ever struggled with parsing messy car listings from different sources? Imagine turning this:
"Check out this stylish Honda City 2018 model for sale, clocked only 30,000 km! Single owner, showroom condition, insurance valid. Yours for just ₹6.5 lakh."
Into this:
{
"Make": "Honda",
"Model": "City",
"Year": 2018,
"Mileage": 30000,
"Price": 6.5,
"AvailabilityType": "Sale",
"Features": ["Single owner", "Showroom condition", "Insurance valid"],
"OwnerCount": 1
}
Let me show you how to build this in under 100 lines of C# using GitHub Models API! 🚀
🎯 The Problem
Car listings come in all shapes and sizes. Whether you're building a price comparison site, marketplace aggregator, or inventory management system, you need to:
- Extract key details (make, model, year, mileage, price)
- Handle different formats (sale, lease, rent)
- Deal with missing information gracefully
- Process data at scale
Manually parsing this is tedious. Let AI do the heavy lifting! 💪
🛠️ The Solution
We'll use:
- GitHub Models - Free access to powerful AI models
- Microsoft.Extensions.AI - Unified AI abstraction for .NET
- .NET 10 - Latest and greatest
📦 Quick Setup
First, grab your free GitHub token from GitHub Models (no credit card needed!).
Create a new console app:
dotnet new console -n TextExtraction
cd TextExtraction
dotnet add package Microsoft.Extensions.AI.OpenAI
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
Store your token securely:
dotnet user-secrets init
dotnet user-secrets set "GitHubModels:Token" "your-github-token"
🎨 Building the Model
Create a CarDetails.cs class to define what we want to extract:
using System.Text.Json.Serialization;
[JsonConverter(typeof(JsonStringEnumConverter))]
public enum AvailabilityType
{
Sale,
Lease,
Rent
}
public class CarDetails
{
public string Make { get; set; } = string.Empty;
public string Model { get; set; } = string.Empty;
public int? Year { get; set; }
public double? Mileage { get; set; }
public double? Price { get; set; }
public AvailabilityType? AvailabilityType { get; set; }
public double? PricePerMonth { get; set; }
public double? PricePerDay { get; set; }
public string[]? Features { get; set; }
public string? Location { get; set; }
public string ShortSummary { get; set; } = string.Empty;
public int? OwnerCount { get; set; }
}
Notice the nullable types? That's how we handle missing data elegantly! ✨
🧠 The Magic: AI-Powered Extraction
Here's the core extraction logic in Program.cs:
using Microsoft.Extensions.AI;
using OpenAI;
using System.ClientModel;
// Configure the client
var configuration = new ConfigurationBuilder()
.AddUserSecrets<Program>()
.Build();
var credential = new ApiKeyCredential(
configuration["GitHubModels:Token"] ??
throw new InvalidOperationException("Token not found")
);
IChatClient chatClient = new OpenAIClient(credential, new OpenAIClientOptions
{
Endpoint = new Uri("https://models.inference.ai.azure.com")
}).GetChatClient("gpt-4o-mini")
.AsIChatClient();
// Define extraction schema
var prompt = @"Extract the following details from the car listing and return ONLY a valid JSON object:
{
""Make"": ""string - car manufacturer/brand"",
""Model"": ""string - car model name"",
""Year"": number - manufacturing year,
""Mileage"": number - kilometers driven,
""Price"": number - price in lakhs,
""AvailabilityType"": ""string - one of: Sale, Lease, Rent"",
""Features"": ""array of strings - notable features"",
""ShortSummary"": ""string - brief summary in 10-15 words"",
""OwnerCount"": number - previous owners (null if not mentioned)
}
Return only the JSON object, no additional text.";
// Sample car listings
var carListings = new List<string>
{
"Honda City 2018 for sale, only 30,000 km! Single owner, showroom condition. ₹6.5 lakh.",
"Hyundai Creta SX 2020 — premium SUV with sunroof. Monthly lease at ₹22,000.",
"Toyota Innova Crysta 2019 — spacious 7-seater, 40,000 km, rent at ₹2,500/day."
};
// Process each listing
foreach (var listing in carListings)
{
var response = await chatClient.GetResponseAsync<CarDetails>(
$"{prompt}\n\nCar Listing:\n{listing}"
);
if (response.TryGetResult(out CarDetails? carDetails) && carDetails != null)
{
Console.WriteLine($"✅ Extracted: {carDetails.Make} {carDetails.Model}");
Console.WriteLine(JsonSerializer.Serialize(carDetails,
new JsonSerializerOptions { WriteIndented = true }));
}
}
🎬 Run It!
dotnet run
Output:
Processing car listings...
✅ Extracted: Honda City
{
"Make": "Honda",
"Model": "City",
"Year": 2018,
"Mileage": 30000,
"Price": 6.5,
"AvailabilityType": "Sale",
"Features": ["Single owner", "Showroom condition"],
"OwnerCount": 1
}
✅ Extracted: Hyundai Creta
{
"Make": "Hyundai",
"Model": "Creta SX",
"Year": 2020,
"AvailabilityType": "Lease",
"PricePerMonth": 22000,
"Features": ["Premium SUV", "Sunroof"]
}
🚀 Level Up: Customization Ideas
1. Extract Different Fields
Add fuel type, transmission, color:
public string? FuelType { get; set; } // Petrol/Diesel/Electric
public string? Transmission { get; set; } // Manual/Automatic
public string? Color { get; set; }
2. Process Real-Time Data
Connect to web scraping APIs or RSS feeds:
var listings = await FetchListingsFromApi("https://api.carmarket.com/listings");
3. Add Validation
if (carDetails.Year < 1900 || carDetails.Year > DateTime.Now.Year)
{
Console.WriteLine("⚠️ Invalid year detected");
}
4. Export to Database
await dbContext.CarListings.AddAsync(carDetails);
await dbContext.SaveChangesAsync();
5. Use a Better Model
For higher accuracy, switch to GPT-4o:
.GetChatClient("gpt-4o") // More capable, slightly slower
💡 Pro Tips
- Keep temperature low (default is good) for consistent extraction
- Be specific in prompts - define exact format you want
- Use nullable types - not all listings have all fields
- Batch process - handle multiple listings efficiently
-
Monitor token usage - track costs with
response.Usage
🎯 Real-World Applications
- 🏪 Marketplace Aggregation: Consolidate listings from multiple sources
- 💰 Price Intelligence: Track pricing trends across markets
- 📊 Analytics Dashboards: Build insights from unstructured data
- 🤖 Chatbots: Power car recommendation bots
- 📱 Mobile Apps: Parse user-submitted listings
🔗 Get the Full Code
Grab the complete working example from GitHub:
👉 genai-dotnet-basic_llm_tasks/TextExtraction
The repo includes:
- ✅ Full source code with comments
- ✅ 9 example car listings
- ✅ Configuration setup guide
- ✅ Detailed README
🎓 What You Learned
- Using GitHub Models API in .NET
- Strongly-typed AI responses with
GetResponseAsync<T> - Schema-based extraction with AI
- Handling unstructured data gracefully
- Building production-ready text extraction
🤔 What's Next?
Try extracting:
- 📄 Resume data (name, skills, experience)
- 🧾 Invoices (vendor, amounts, dates)
- 📧 Emails (sender, subject, key points)
- 🏠 Real estate listings
- 🍕 Restaurant menus (dishes, prices, ingredients)
The same pattern works for ANY text extraction task!
💬 Let's Connect!
What will you build with text extraction? Drop a comment below! 👇
Found this helpful? Give it a ❤️ and follow for more .NET + AI content!
Tags: #dotnet #ai #machinelearning #csharp #github #opensource #textextraction #nlp #automation
GitHub Repo: https://github.com/Rahul1994jh/genai-dotnet-basic_llm_tasks
Top comments (0)