I divided a data set into 2 sets based on the value of dup_flag, which was created by sorting on STUDENT. All records in "undup" are unique. All records in "alldup3" are duplicates
However, in some cases the duplicate can be eliminated, depending on the value of VAR2 in each record. If VAR2 is the same, as in the case of "sam", I can eliminate the dup. The values of VAR3 are irrelevant to this task.
data I have:
STUDENT VAR2 VAR 3
sam 1 trivial
sam 1 trivial
bob 1 trivial
bob 2 trivial
data I want:
STUDENT VAR2 VAR 3
sam 1 trivial
bob 1 trivial
bob 2 trivial
data undup alldup3;
set alldup2;
if dup_flag='No' then output undup;
else output alldup3;
run;
Use PROC SORT since you need your data sorted anyways - do you need both data sets or just the shown data?
proc sort data=have nodupkey out=want1 ;
by descending student var2;
run;
Use PROC SORT since you need your data sorted anyways - do you need both data sets or just the shown data?
proc sort data=have nodupkey out=want1 ;
by descending student var2;
run;
I don't need to do anything with the undup data set -- just the set that contains the dups.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.