Finding duplicate rows in a table can be done very simply by using Common Table Expressions knows as CTE’s along with the SQL Server built-in function ROW_Number() with the <PARTITION BY> and <ORDER BY> arguments found in SQL Server 2005 or later. In the sample below I’m looking for products that exist in the Products table more than once where the ProductName, Quantity, and Price are the same.
WITH DUPLICATES_CTE AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY ProductName, Quantity, Price ORDER BY (SELECT 0)) AS DuplicateRowNumber FROM Products ) SELECT * from DUPLICATES_CTE --OR DELETE FROM DUPLICATES_CTE WHERE DuplicateRowNumber > 1
Next I’ll break the SQL statement into pieces to see what’s happening:
- The “WITH DUPLICATES AS” creates the CTE to be used later in the query like a temporary table but it only exists for the duration of the query.
- First is the ROW_NUMBER() built-in function returns the number of a row within a partition of a result set, starting at 1 for the first row in each partition.
- Second is the the <partition_by_clause> argument which segments the results of the FROM clause into partitions –this is where ROW_NUMBER() is applied.
- Finally, the <order_by_clause> argument checks the order of how the ROW_Number value is doled out to the rows in the partition. When executed you can access the result of this query by referencing the CTE named DUPLICATES_CTE with a simple select query.
You can find more on CTE’s and ROW_Number() here http://msdn.microsoft.com/en-us/library/ms190766.aspx and here http://msdn.microsoft.com/en-us/library/ms186734.aspx