JP Voogt A Data Enthusiasts Ramblings

T-SQL Removing Duplicate non-Identical Records

Introduction

I found that some of us repeat the same processes over and over again where other people might never need to perform a specified task which I would consider basic, thus I want to share with you how I approach removing of duplicate records in my data. Now if you are lucky enough that the entire row is a complete duplicate then you can just go ahead and slap a DISTINCT in your query and your problem should be gone. You will have to be mindful of the limitations here as it doesn’t support long text fields as per Microsoft Docs

I Prefer to rather decide what makes my records distinct and act accordingly.

Remove Duplicate Records for Reporting

Description

When removing duplicates you will have to decide what makes your record unique, could it be a merchant id or even a social security number? I Like to go with the CTE and ROW_NUMBER approach when removing duplicates from me datasets. The only drawback here is, if you do not want the new column you will have to specify the required columns when you select from the CTE

Code

WITH CTE
AS
(
    SELECT  *
            , ROW_NUMBER() OVER(PARTITION BY [<WHAT COLUMN/S MAKE MY RECORD UNIQUE>] ORDER BY [<YOUR PREFERED ORDER>]) RN
    FROM    [myTable]
)
SELECT  *
FROM    CTE
WHERE   RN = 1

Remove Duplicate Records From a Table

Description

Sometimes you are not just removing duplicates for reporting purposes and you want to delete the duplicates from the actual table. SQL Server allows this awesome trick where you can write your logic in a CTE and then DELETE records based on the CTE logic as per the below example. Here I delete all records that occurred more than once.

Code

WITH CTE
AS
(
    SELECT  *
            , ROW_NUMBER() OVER(PARTITION BY [<WHAT COLUMN/S MAKE MY RECORD UNIQUE>] ORDER BY [<YOUR PREFERED ORDER>]) RN
    FROM    [myTable]
)
DELETE FROM CTE
WHERE   RN != 1

Disclaimer: Content is accurate at the time of publication, however updates and new additions happen frequently which could change the accuracy or relevance. Please keep this in mind when using my content as guidelines. Please always test in a testing or development environment, I do not accept any liability for damages caused by this content.

If you liked this post, you can share it with your followers or follow me on Twitter!