BookmarkSubscribeRSS Feed

SAS Viya: Remove Duplicates in SAS Studio Flow

Started ‎01-19-2023 by
Modified ‎01-19-2023 by
Views 2,566

With the November 2022 stable release (2022.11) there is now the capability to Remove Duplicates in a SAS Studio Flow.  This step is used to remove duplicate rows from an input table and create an output table with the unique rows.  The duplicate row could be based on all columns or specified column(s).  

 

I want to remove duplicate records from my customer data set.

 

1_RemoveDuplicates-1024x707.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 I use the DQ – Match Code step from the public custom step repository to generate match codes for the Name, Address, and Zip fields to facilitate fuzzy matching on those fields when removing the duplicate records.

 

2_RemoveDuplicates-1024x698.png

  

Next, I add the Remove Duplicates step from the Transform section to the flow.

 

3_RemoveDuplicates-1024x579.png

  

I uncheck the option to Remove duplicates across all columns and add the condition to remove duplicates where the Name_MC, Address_MC, and Zip_MC columns contain the same values.

 

4_RemoveDuplicates.png

  

The Output tab has options to Replace existing output table with same name.  If the output table is a CAS Table, then you have the option to promote and/or save the table.  Also, if the output table is in PATH, DNFS, ADLS, or S3 CAS library, then you can specify the output format.

 

5_RemoveDuplicates.png

  

On the Debug tab, you have the option to select to Debug SAS macros.  I check this option for my flow.

 

6_RemoveDuplicates.png

  

I save and run the flow and now my duplicate customer records have been removed.

 

7_RemoveDuplicates-1024x705.png

 

  I review the Log and confirm the number of duplicate rows removed from my customer list.

 

8_RemoveDuplicates-1024x588.png

    

Summary

The Remove Duplicates step is now available in SAS Studio Flow. 

 

For more information review its documentation: SAS Help Center: Removing Duplicates.  

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
‎01-19-2023 10:07 AM
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags