topic Will data step automatically remove duplicates in SAS Enterprise Guide

Will data step automatically remove duplicates

eagles_dare13 — Tue, 13 May 2014 14:44:05 GMT

When I write:

data work.temp;

set work.temp1

work.temp2

work.temp3;

keep col1 col2 col3 col4 col5;

run;

Will it only keep unique rows in the final output or will it keep duplicates? Also how is the above code different from proc append?

Re: Will data step automatically remove duplicates

RW9 — Tue, 13 May 2014 14:49:08 GMT

No, it will not. Use proc sort with nodupkey, or SQL distinct to get rid of duplicates.

You code will create a new file, open each in the set and write out to the new file. Not very read/write optimized. Proc append doesn't open the file, just sets them together so less I/O operation. The proc append will throw warnings if the dataset formats are not exact or if there are more variables. Dataset will warn on some things, but will expand the output table with any new columns without warning. Best check the guidance for complete run down as there are plus/minus points on each/