BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Here's a more complete version.  It assumes your variables are character.  If they are numeric, you would need to switch from SORTC to SORTN.

data want (keep=id);
set have;
array hlth {20};
call sortc (of hlth{*});
do _n_=1 to 19 until (flag=1);
   if not missing (hlth{_n_}) and hlth{_n_} = hlth{_n_+1} then flag=1;
end;
if flag=1; run;

This version keeps just the ID.  You would have to go back to the original data to check why these IDs are flagged.

View solution in original post

5 REPLIES 5
mkeintz
PROC Star
  1. What do you want the output to look like?  Will you test for all duplicates within an ID, or just a dummy indicating duplicates have been found?  How do you want the test results presented?
  2. Do you want tested code?  Is so, please provide a sampe dataset in the form of a DATA step.  Help us help you.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

For each observation, you could successively compare the largest HLTH code to the 2nd largest, then the 2nd largest to the 3rd largest until you either exhaust all the non-missing values (so flag would be 0), or you find a duplicate (flag=1).

 

To do that you can use the LARGEST function and the LAG function, as in:

 

data want (drop=_:);
  set have;
  flag=0;
  do _L=1 to n(of hlth:) while (flag=0);
    _x=largest(_L,of hlth:);
    if _L>1 and _x=lag(_x) then flag=1;
  end;
run;

I'm not going to test this on your data -

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

As long as you copy the data to another data set (because we're going to change data values), it should be easy enough.

 

Put the variables into an array, then use call sortc to change the order of the variables.  

 

Then move through the array and compare whether two consecutive values are identical (being careful not to take two consecutive missing values).

 

Let me know if you need help with this.

Astounding
PROC Star

Here's a more complete version.  It assumes your variables are character.  If they are numeric, you would need to switch from SORTC to SORTN.

data want (keep=id);
set have;
array hlth {20};
call sortc (of hlth{*});
do _n_=1 to 19 until (flag=1);
   if not missing (hlth{_n_}) and hlth{_n_} = hlth{_n_+1} then flag=1;
end;
if flag=1; run;

This version keeps just the ID.  You would have to go back to the original data to check why these IDs are flagged.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3483 views
  • 1 like
  • 3 in conversation