BookmarkSubscribeRSS Feed

SAS Viya: Remove Duplicates in SAS Studio Flow

Started ‎01-19-2023 by
Modified ‎01-19-2023 by
Views 2,790

With the November 2022 stable release (2022.11) there is now the capability to Remove Duplicates in a SAS Studio Flow.  This step is used to remove duplicate rows from an input table and create an output table with the unique rows.  The duplicate row could be based on all columns or specified column(s).  

 

I want to remove duplicate records from my customer data set.

 

1_RemoveDuplicates-1024x707.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 I use the DQ – Match Code step from the public custom step repository to generate match codes for the Name, Address, and Zip fields to facilitate fuzzy matching on those fields when removing the duplicate records.

 

2_RemoveDuplicates-1024x698.png

  

Next, I add the Remove Duplicates step from the Transform section to the flow.

 

3_RemoveDuplicates-1024x579.png

  

I uncheck the option to Remove duplicates across all columns and add the condition to remove duplicates where the Name_MC, Address_MC, and Zip_MC columns contain the same values.

 

4_RemoveDuplicates.png

  

The Output tab has options to Replace existing output table with same name.  If the output table is a CAS Table, then you have the option to promote and/or save the table.  Also, if the output table is in PATH, DNFS, ADLS, or S3 CAS library, then you can specify the output format.

 

5_RemoveDuplicates.png

  

On the Debug tab, you have the option to select to Debug SAS macros.  I check this option for my flow.

 

6_RemoveDuplicates.png

  

I save and run the flow and now my duplicate customer records have been removed.

 

7_RemoveDuplicates-1024x705.png

 

  I review the Log and confirm the number of duplicate rows removed from my customer list.

 

8_RemoveDuplicates-1024x588.png

    

Summary

The Remove Duplicates step is now available in SAS Studio Flow. 

 

For more information review its documentation: SAS Help Center: Removing Duplicates.  

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
‎01-19-2023 10:07 AM
Updated by:

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags