In daily data processing workflows, dealing with duplicate rows in Excel is one of the most common challenges for developers and data analysts. Whether you need data cleaning, automated report generation, or bulk data analysis, removing duplicate Excel rows is an indispensable routine task.
This comprehensive guide will walk you through how to delete duplicate rows in Excel programmatically with C# using Free Spire.XLS for .NET, a free Excel library that requires no Microsoft Office installation.
Environment Setup & NuGet Installation
Before writing C# code to remove Excel duplicates, install the free Excel processing library via NuGet:
Install-Package FreeSpire.XLS
You can also search and install FreeSpire.XLS directly in Visual Studio’s Manage NuGet Packages panel.
The free edition covers all essential features and fully supports duplicate row removal for small to medium Excel datasets.
Step-by-Step C# Code to Remove Excel Duplicate Rows
1. Load Excel Workbook and Target Worksheet
using Spire.Xls;
using System.Linq;
// Initialize Excel workbook instance
Workbook workbook = new Workbook();
// Load local Excel file
workbook.LoadFromFile("Test.xlsx");
// Access the first worksheet
Worksheet sheet = workbook.Worksheets[0];
2. Set Valid Excel Data Range
Target only rows with actual data by fetching the last used row via the LastRow property. We set the range from A1 to the last row of Column A, and you can adjust the starting row to skip header rows as needed.
// Use Column A as the duplicate checking column, data starts at Row 1
var range = sheet.Range["A1:A" + sheet.LastRow];
3. Locate Duplicate Row Numbers with LINQ
We use LINQ grouping to filter duplicate rows. The DisplayedText property is more reliable than Value or Text, as it captures formatted cell content, formula results, and visual display text in Excel.
var duplicatedRows = range.Rows
.GroupBy(row => row.Columns[0].DisplayedText) // Group rows by Column A displayed text
.Where(group => group.Count() > 1) // Keep groups with multiple duplicate entries
.SelectMany(group => group.Skip(1)) // Retain first row, select remaining duplicates
.Select(row => row.Columns[0].Row) // Extract row numbers of duplicates
.ToList();
-
Range.Rows: Gets all row objects within the selected data range -
GroupBy: Creates groups based on cell text in the key column -
Skip(1): Preserves the first unique row in each group - Outputs a row number list for batch deletion
4. Delete Duplicate Excel Rows Without Index Error
Direct row deletion causes upward row shifting and invalid row indexes. We fix this by dynamically adjusting row numbers with duplicatedRows[i] - i during loop deletion.
for (int i = 0; i < duplicatedRows.Count; i++)
{
sheet.DeleteRow(duplicatedRows[i] - i);
}
5. Save Modified Excel File
workbook.SaveToFile("RemoveDuplicateRows.xlsx", ExcelVersion.Version2016);
Supports saving as .xlsx, .xls and other mainstream Excel file versions.
Key Tips & Best Practices
1. Check Duplicates by Multiple Columns
If you need to identify duplicates based on combined values of Column A and Column B, modify the GroupBy logic:
.GroupBy(row => new {
Col1 = row.Columns[0].DisplayedText,
Col2 = row.Columns[1].DisplayedText
})
2. Skip Excel Header Rows
Exclude the first header row from duplicate checking by adjusting the data range:
var range = sheet.Range["A2:A" + sheet.LastRow];
3. DisplayedText vs Value in Excel Cells
| Property | Description |
|---|---|
DisplayedText |
Matches Excel visual display, includes number/date formatting and formula results – ideal for duplicate validation |
Value |
Returns raw cell data and ignores formatting, prone to duplicate misjudgment |
4. Safe Deletion (Alternative)
Another reliable way is sorting row numbers in descending order before deletion:
foreach (int rowNum in duplicatedRows.OrderByDescending(r => r))
{
sheet.DeleteRow(rowNum);
}
5. Release Excel Resource Properly
Always dispose the Workbook object or use the using statement to release memory and avoid resource leaks after file processing.
Advanced: Custom Duplicate Matching Logic
Trim blank spaces or ignore empty rows before duplicate comparison with custom preprocessing:
var processedRows = range.Rows
.Select(row => new {
RowObject = row,
KeyText = row.Columns[0].DisplayedText?.Trim() ?? string.Empty
});
// Group and filter based on processed KeyText
Conclusion
This tutorial covers the end-to-end process to remove duplicate rows in Excel with C#, including file loading, duplicate detection, batch row deletion, and result saving. The solution supports single-column and multi-column duplicate checking, fits automated data cleaning systems perfectly, and works without installing Microsoft Office on the server or local device.
Top comments (0)