DEV Community

Jeremy K.
Jeremy K.

Posted on

How to Remove Duplicate Excel Rows with C# (No Office Required)

In daily data processing workflows, dealing with duplicate rows in Excel is one of the most common challenges for developers and data analysts. Whether you need data cleaning, automated report generation, or bulk data analysis, removing duplicate Excel rows is an indispensable routine task.
This comprehensive guide will walk you through how to delete duplicate rows in Excel programmatically with C# using Free Spire.XLS for .NET, a free Excel library that requires no Microsoft Office installation.


Environment Setup & NuGet Installation

Before writing C# code to remove Excel duplicates, install the free Excel processing library via NuGet:

Install-Package FreeSpire.XLS
Enter fullscreen mode Exit fullscreen mode

You can also search and install FreeSpire.XLS directly in Visual Studio’s Manage NuGet Packages panel.

The free edition covers all essential features and fully supports duplicate row removal for small to medium Excel datasets.


Step-by-Step C# Code to Remove Excel Duplicate Rows

1. Load Excel Workbook and Target Worksheet

using Spire.Xls;
using System.Linq;

// Initialize Excel workbook instance
Workbook workbook = new Workbook();

// Load local Excel file
workbook.LoadFromFile("Test.xlsx");

// Access the first worksheet
Worksheet sheet = workbook.Worksheets[0];
Enter fullscreen mode Exit fullscreen mode

2. Set Valid Excel Data Range

Target only rows with actual data by fetching the last used row via the LastRow property. We set the range from A1 to the last row of Column A, and you can adjust the starting row to skip header rows as needed.

// Use Column A as the duplicate checking column, data starts at Row 1
var range = sheet.Range["A1:A" + sheet.LastRow];
Enter fullscreen mode Exit fullscreen mode

3. Locate Duplicate Row Numbers with LINQ

We use LINQ grouping to filter duplicate rows. The DisplayedText property is more reliable than Value or Text, as it captures formatted cell content, formula results, and visual display text in Excel.

var duplicatedRows = range.Rows
    .GroupBy(row => row.Columns[0].DisplayedText)   // Group rows by Column A displayed text
    .Where(group => group.Count() > 1)             // Keep groups with multiple duplicate entries
    .SelectMany(group => group.Skip(1))            // Retain first row, select remaining duplicates
    .Select(row => row.Columns[0].Row)             // Extract row numbers of duplicates
    .ToList();
Enter fullscreen mode Exit fullscreen mode
  • Range.Rows: Gets all row objects within the selected data range
  • GroupBy: Creates groups based on cell text in the key column
  • Skip(1): Preserves the first unique row in each group
  • Outputs a row number list for batch deletion

4. Delete Duplicate Excel Rows Without Index Error

Direct row deletion causes upward row shifting and invalid row indexes. We fix this by dynamically adjusting row numbers with duplicatedRows[i] - i during loop deletion.

for (int i = 0; i < duplicatedRows.Count; i++)
{
    sheet.DeleteRow(duplicatedRows[i] - i);
}
Enter fullscreen mode Exit fullscreen mode

5. Save Modified Excel File

workbook.SaveToFile("RemoveDuplicateRows.xlsx", ExcelVersion.Version2016);
Enter fullscreen mode Exit fullscreen mode

Supports saving as .xlsx, .xls and other mainstream Excel file versions.


Key Tips & Best Practices

1. Check Duplicates by Multiple Columns

If you need to identify duplicates based on combined values of Column A and Column B, modify the GroupBy logic:

.GroupBy(row => new { 
    Col1 = row.Columns[0].DisplayedText, 
    Col2 = row.Columns[1].DisplayedText 
})
Enter fullscreen mode Exit fullscreen mode

2. Skip Excel Header Rows

Exclude the first header row from duplicate checking by adjusting the data range:

var range = sheet.Range["A2:A" + sheet.LastRow];
Enter fullscreen mode Exit fullscreen mode

3. DisplayedText vs Value in Excel Cells

Property Description
DisplayedText Matches Excel visual display, includes number/date formatting and formula results – ideal for duplicate validation
Value Returns raw cell data and ignores formatting, prone to duplicate misjudgment

4. Safe Deletion (Alternative)

Another reliable way is sorting row numbers in descending order before deletion:

foreach (int rowNum in duplicatedRows.OrderByDescending(r => r))
{
    sheet.DeleteRow(rowNum);
}
Enter fullscreen mode Exit fullscreen mode

5. Release Excel Resource Properly

Always dispose the Workbook object or use the using statement to release memory and avoid resource leaks after file processing.

Advanced: Custom Duplicate Matching Logic

Trim blank spaces or ignore empty rows before duplicate comparison with custom preprocessing:

var processedRows = range.Rows
    .Select(row => new {
        RowObject = row,
        KeyText = row.Columns[0].DisplayedText?.Trim() ?? string.Empty
    });
// Group and filter based on processed KeyText
Enter fullscreen mode Exit fullscreen mode

Conclusion

This tutorial covers the end-to-end process to remove duplicate rows in Excel with C#, including file loading, duplicate detection, batch row deletion, and result saving. The solution supports single-column and multi-column duplicate checking, fits automated data cleaning systems perfectly, and works without installing Microsoft Office on the server or local device.

Top comments (0)