BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Question, In sas when we do a proc sort nodupkey the data set sorts by different variables and it might say.. There were 308 variables duplicates..How do you capture that before SAS deletes them?
Can I just do an output?? I tried a dupout but it said that it wasn't specified.. And it didn't work.

Please help, thanks Message was edited by: SASDUMMY
2 REPLIES 2
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
When you use DUPOUT= the duplicate observations (whether you use NODUPS or NODUPKEY) are sent to the SAS member specified on the DUPOUT= parameter value, as shown below. If you have having difficulty, suggest sharing SAS log information for an accurate response. Also, consider that your BY variable list must be granular, to result in duplicate observations being adjacent in the sorted file, otherwise duplicates may not be removed as you might expect.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic/post:

proc sort dupout site:sas.com

_________________________

1 proc sort nodupkey data=sashelp.class out=class dupout=class_dup;
2 by sex;
3 run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: 17 observations with duplicate key values were deleted.
NOTE: The data set WORK.CLASS has 2 observations and 5 variables.
NOTE: The data set WORK.CLASS_DUP has 17 observations and 5 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.31 seconds
cpu time 0.04 seconds
Peter_C
Rhodochrosite | Level 12
sometimes I want to know more than DUPOUT= reveals. Examining the pair which is not deleted (and NOT sent to DUPOUT=) might be informative. Then something like this (I also want to know which rows of the input held those dups)[pre]data _n_ /view=_n_ ; * identify input rows with new variable __N_ ;
__n_ = _n_ ;
label __n_ = '_n_' ;
set &_original_data ;
run ;
proc sort out= _data_ ; *accepting the default name - only problem if file too large for work library area;
by &_by_list ;
run ;
data &_my_dup_pairs &_output_data_set( drop= __n_ ) ;
set &syslast ;
by &_by_list ;
if not ( first.%scan( &_by_list, -1 )
and last.%scan( &_by_list, -1 ) ) then output &_my_dup_pairs ;
if last.%scan( &_by_list, -1 ) then output &_output_data_set ;
run ;[/pre]provides the required de-duplicated data (named in &_output_data_set) as well as a data set (named in &_my_dup_pairs) containing the observations for which there are duplicate keys.

pre-requisites
&_original_data
&_my_dup_pairs
&_output_data_set
&_by_list


peterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 788 views
  • 0 likes
  • 3 in conversation