Desktop productivity for business analysts and programmers

Will data step automatically remove duplicates

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 82
Accepted Solution

Will data step automatically remove duplicates

When I write:

data work.temp;

   set work.temp1

       work.temp2

       work.temp3;

   keep col1 col2 col3 col4 col5;

run;

Will it only keep unique rows in the final output or will it keep duplicates? Also how is the above code different from proc append?


Accepted Solutions
Solution
‎05-13-2014 10:49 AM
Super User
Super User
Posts: 7,720

Re: Will data step automatically remove duplicates

No, it will not.  Use proc sort with nodupkey, or SQL distinct to get rid of duplicates.

You code will create a new file, open each in the set and write out to the new file.  Not very read/write optimized.  Proc append doesn't open the file, just sets them together so less I/O operation.  The proc append will throw warnings if  the dataset formats are not exact or if there are more variables.  Dataset will warn on some things, but will expand the output table with any new columns without warning.  Best check the guidance for complete run down as there are plus/minus points on each/

View solution in original post


All Replies
Solution
‎05-13-2014 10:49 AM
Super User
Super User
Posts: 7,720

Re: Will data step automatically remove duplicates

No, it will not.  Use proc sort with nodupkey, or SQL distinct to get rid of duplicates.

You code will create a new file, open each in the set and write out to the new file.  Not very read/write optimized.  Proc append doesn't open the file, just sets them together so less I/O operation.  The proc append will throw warnings if  the dataset formats are not exact or if there are more variables.  Dataset will warn on some things, but will expand the output table with any new columns without warning.  Best check the guidance for complete run down as there are plus/minus points on each/

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 289 views
  • 0 likes
  • 2 in conversation