Question, In sas when we do a proc sort nodupkey the data set sorts by different variables and it might say.. There were 308 variables duplicates..How do you capture that before SAS deletes them?
Can I just do an output?? I tried a dupout but it said that it wasn't specified.. And it didn't work.
Please help, thanks
Message was edited by: SASDUMMY
When you use DUPOUT= the duplicate observations (whether you use NODUPS or NODUPKEY) are sent to the SAS member specified on the DUPOUT= parameter value, as shown below. If you have having difficulty, suggest sharing SAS log information for an accurate response. Also, consider that your BY variable list must be granular, to result in duplicate observations being adjacent in the sorted file, otherwise duplicates may not be removed as you might expect.
Suggested Google advanced search argument, this topic/post:
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: 17 observations with duplicate key values were deleted.
NOTE: The data set WORK.CLASS has 2 observations and 5 variables.
NOTE: The data set WORK.CLASS_DUP has 17 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.31 seconds
cpu time 0.04 seconds
sometimes I want to know more than DUPOUT= reveals. Examining the pair which is not deleted (and NOT sent to DUPOUT=) might be informative. Then something like this (I also want to know which rows of the input held those dups)[pre]data _n_ /view=_n_ ; * identify input rows with new variable __N_ ;
__n_ = _n_ ;
label __n_ = '_n_' ;
set &_original_data ;
proc sort out= _data_ ; *accepting the default name - only problem if file too large for work library area;
by &_by_list ;
data &_my_dup_pairs &_output_data_set( drop= __n_ ) ;
set &syslast ;
by &_by_list ;
if not ( first.%scan( &_by_list, -1 )
and last.%scan( &_by_list, -1 ) ) then output &_my_dup_pairs ;
if last.%scan( &_by_list, -1 ) then output &_output_data_set ;
run ;[/pre]provides the required de-duplicated data (named in &_output_data_set) as well as a data set (named in &_my_dup_pairs) containing the observations for which there are duplicate keys.