BookmarkSubscribeRSS Feed

SAS Viya: Remove Duplicates in SAS Studio Flow

Started ‎01-19-2023 by
Modified ‎01-19-2023 by
Views 1,830

With the November 2022 stable release (2022.11) there is now the capability to Remove Duplicates in a SAS Studio Flow.  This step is used to remove duplicate rows from an input table and create an output table with the unique rows.  The duplicate row could be based on all columns or specified column(s).  

 

I want to remove duplicate records from my customer data set.

 

1_RemoveDuplicates-1024x707.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 I use the DQ – Match Code step from the public custom step repository to generate match codes for the Name, Address, and Zip fields to facilitate fuzzy matching on those fields when removing the duplicate records.

 

2_RemoveDuplicates-1024x698.png

  

Next, I add the Remove Duplicates step from the Transform section to the flow.

 

3_RemoveDuplicates-1024x579.png

  

I uncheck the option to Remove duplicates across all columns and add the condition to remove duplicates where the Name_MC, Address_MC, and Zip_MC columns contain the same values.

 

4_RemoveDuplicates.png

  

The Output tab has options to Replace existing output table with same name.  If the output table is a CAS Table, then you have the option to promote and/or save the table.  Also, if the output table is in PATH, DNFS, ADLS, or S3 CAS library, then you can specify the output format.

 

5_RemoveDuplicates.png

  

On the Debug tab, you have the option to select to Debug SAS macros.  I check this option for my flow.

 

6_RemoveDuplicates.png

  

I save and run the flow and now my duplicate customer records have been removed.

 

7_RemoveDuplicates-1024x705.png

 

  I review the Log and confirm the number of duplicate rows removed from my customer list.

 

8_RemoveDuplicates-1024x588.png

    

Summary

The Remove Duplicates step is now available in SAS Studio Flow. 

 

For more information review its documentation: SAS Help Center: Removing Duplicates.  

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎01-19-2023 10:07 AM
Updated by:
Contributors

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags