Help using Base SAS procedures

Help with NODUPKEY

Reply
N/A
Posts: 0

Help with NODUPKEY

Question, In sas when we do a proc sort nodupkey the data set sorts by different variables and it might say.. There were 308 variables duplicates..How do you capture that before SAS deletes them?
Can I just do an output?? I tried a dupout but it said that it wasn't specified.. And it didn't work.

Please help, thanks Message was edited by: SASDUMMY
Super Contributor
Super Contributor
Posts: 3,174

Re: Help with NODUPKEY

Posted in reply to deleted_user
When you use DUPOUT= the duplicate observations (whether you use NODUPS or NODUPKEY) are sent to the SAS member specified on the DUPOUT= parameter value, as shown below. If you have having difficulty, suggest sharing SAS log information for an accurate response. Also, consider that your BY variable list must be granular, to result in duplicate observations being adjacent in the sorted file, otherwise duplicates may not be removed as you might expect.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic/post:

proc sort dupout site:sas.com

_________________________

1 proc sort nodupkey data=sashelp.class out=class dupout=class_dup;
2 by sex;
3 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: 17 observations with duplicate key values were deleted.
NOTE: The data set WORK.CLASS has 2 observations and 5 variables.
NOTE: The data set WORK.CLASS_DUP has 17 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.31 seconds
cpu time 0.04 seconds
Valued Guide
Posts: 2,177

Re: Help with NODUPKEY

Posted in reply to deleted_user
sometimes I want to know more than DUPOUT= reveals. Examining the pair which is not deleted (and NOT sent to DUPOUT=) might be informative. Then something like this (I also want to know which rows of the input held those dups)[pre]data _n_ /view=_n_ ; * identify input rows with new variable __N_ ;
__n_ = _n_ ;
label __n_ = '_n_' ;
set &_original_data ;
run ;
proc sort out= _data_ ; *accepting the default name - only problem if file too large for work library area;
by &_by_list ;
run ;
data &_my_dup_pairs &_output_data_set( drop= __n_ ) ;
set &syslast ;
by &_by_list ;
if not ( first.%scan( &_by_list, -1 )
and last.%scan( &_by_list, -1 ) ) then output &_my_dup_pairs ;
if last.%scan( &_by_list, -1 ) then output &_output_data_set ;
run ;[/pre]provides the required de-duplicated data (named in &_output_data_set) as well as a data set (named in &_my_dup_pairs) containing the observations for which there are duplicate keys.

pre-requisites
&_original_data
&_my_dup_pairs
&_output_data_set
&_by_list


peterC
Ask a Question
Discussion stats
  • 2 replies
  • 164 views
  • 0 likes
  • 3 in conversation