Re: Help with NODUPKEY

deleted_user · Posted 09-01-2010 02:21 PM

Question, In sas when we do a proc sort nodupkey the data set sorts by different variables and it might say.. There were 308 variables duplicates..How do you capture that before SAS deletes them?
Can I just do an output?? I tried a dupout but it said that it wasn't specified.. And it didn't work.

Please help, thanks Message was edited by: SASDUMMY

sbb · Posted 09-01-2010 02:53 PM

When you use DUPOUT= the duplicate observations (whether you use NODUPS or NODUPKEY) are sent to the SAS member specified on the DUPOUT= parameter value, as shown below. If you have having difficulty, suggest sharing SAS log information for an accurate response. Also, consider that your BY variable list must be granular, to result in duplicate observations being adjacent in the sorted file, otherwise duplicates may not be removed as you might expect.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic/post:

proc sort dupout site:sas.com

_________________________

1 proc sort nodupkey data=sashelp.class out=class dupout=class_dup;
2 by sex;
3 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: 17 observations with duplicate key values were deleted.
NOTE: The data set WORK.CLASS has 2 observations and 5 variables.
NOTE: The data set WORK.CLASS_DUP has 17 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.31 seconds
cpu time 0.04 seconds

Peter_C · Posted 09-02-2010 10:13 AM

sometimes I want to know more than DUPOUT= reveals. Examining the pair which is not deleted (and NOT sent to DUPOUT=) might be informative. Then something like this (I also want to know which rows of the input held those dups)[pre]data _n_ /view=_n_ ; * identify input rows with new variable __N_ ;
__n_ = _n_ ;
label __n_ = '_n_' ;
set &_original_data ;
run ;
proc sort out= _data_ ; *accepting the default name - only problem if file too large for work library area;
by &_by_list ;
run ;
data &_my_dup_pairs &_output_data_set( drop= __n_ ) ;
set &syslast ;
by &_by_list ;
if not ( first.%scan( &_by_list, -1 )
and last.%scan( &_by_list, -1 ) ) then output &_my_dup_pairs ;
if last.%scan( &_by_list, -1 ) then output &_output_data_set ;
run ;[/pre]provides the required de-duplicated data (named in &_output_data_set) as well as a data set (named in &_my_dup_pairs) containing the observations for which there are duplicate keys.

pre-requisites
&_original_data
&_my_dup_pairs
&_output_data_set
&_by_list

peterC

Help with NODUPKEY

Re: Help with NODUPKEY

Re: Help with NODUPKEY

SAS Innovate 2026 Registration is Open

SAS Training: Just a Click Away