BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
r-lee-484
Fluorite | Level 6

Hi, I have this dataset where these are hospitalized events matched to each patient. Each event is given a unique number. For example, event1 is matched to event2. 

 

event1event2Subject_ID
1095A2097B74
1095A3003C74
1095A4006D74
2097B3003C12
2097B4006D12
3003C4006D116
9382J5813K32
8762L3250M66
4382E6782F45
4382E5489L45
6782F5489L378

 

I am trying to create a NEW_ID and remove subject_IDs 12, 116 since they are duplicate matches to subject_ID74.  I also need to remove subject_ID 378 since it's a duplicate match to subject_ID 45. I ultimately want to retain matches from subject_ID 74, 32, 66, 45  as they are unique matches and assign a new ID for each subject.

 

Desired output:

event1event2NEW_ID
1095A2097B1
1095A3003C1
1095A4006D1
9382J5813K2
8762L3250M3
4382E6782F4
4382E5489L4

 

Is it possible to use PROC SQL and what kind of rules or code should I write to do this? I have a dataset where there are thousands of matched events so this is a simplified version of what needs to be accomplished.

 

Thanks for your help!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
/*Assuming I understood what you mean.*/
data have;
infile cards expandtabs;
input (event1 event2 Subject_ID) ($);
cards;
1095A 2097B 74
1095A 3003C 74
1095A 4006D 74
2097B 3003C 12
2097B 4006D 12
3003C 4006D 116
9382J 5813K 32
8762L 3250M 66
4382E 6782F 45
4382E 5489L 45
6782F 5489L 378
;

data temp;
 set have;
 key=event1;output;
 key=event2;output;
 keep key Subject_ID;
run;
proc sort data=temp out=key nodupkey;
by Subject_ID key;
run;
proc sort data=temp(keep=Subject_ID) out=Subject_ID nodupkey;
by Subject_ID;
run;



%macro check(key=);
%local n_key n ;
proc sql noprint;
select max(n_key) into :n_key separated by ' ' from 
(
select Subject_ID,count(distinct key) as n_key
 from key 
  where key in (select key from key where Subject_ID="&key.") and Subject_ID ne "&key."
   group by Subject_ID
);
select count(distinct key) into :n separated by ' ' 
 from key 
  where Subject_ID="&key.";
quit;
%put &=key. &=n_key. &=n. ;
%if &n_key.=&n. %then %do;
  data exclude_key;
   length key $ 100;
   key="&key.";
  run;
  proc append base=want data=exclude_key;run;
%end;
%mend;


option nodsnferr nosource;
proc delete data=want;run;
data _null_;
 set Subject_ID;
 call execute(cats('%nrstr(%check(key=',Subject_ID,'))')); 
run;
proc sql;
create table final_want as
select * from have where Subject_ID not in (select key from want);
quit;

View solution in original post

6 REPLIES 6
ballardw
Super User

You have to define "duplicate" in words of two syllables or less for those of us that cannot see how 378 is a "duplicate" of 45.

4382E 6782F 45
4382E 5489L 45
6782F 5489L 378
r-lee-484
Fluorite | Level 6
Looking at these events:
4382E 6782F 45
4382E 5489L 45
6782F 5489L 378

If I change these to dates, subject_ID 45 was hospitalized on 10/25/2000, 11/4/2006, and 6/23/2002. (a total of 3 times)

However, the events 11/4/2006 and 6/23/2002 were matched again to each other and assigned a different subject_ID. This is essentially the same as subject_ID 45 but is missing the match with date 10/25/2000. I don't want subject_ID 378's match

event1 event2 subject_ID
10/25/2000 11/4/2006 45
10/25/2000 6/23/2002 45
11/4/2006 6/23/2002 378

ballardw
Super User

Still having a hard time seeing how a value for Event1 is a match for Event2 for a different subject. They aren't "duplicates". Related perhaps but not duplicates.

 

But then I haven't understood anything in any of your posts because obviously there is a lot going on that relies only on your understanding of the process.

HB
Barite | Level 11 HB
Barite | Level 11

I'm with @ballardw in that I don't understand your rules either.

 

But your desired output is a simple list of DISTINCT event2's.  Pull the event1 that goes with the first occurrence of event2 (databases don't have order) and assign a new id at each change in event1.

 

Post a working data step and somebody might try some code. Help them to help you. 

Ksharp
Super User
/*Assuming I understood what you mean.*/
data have;
infile cards expandtabs;
input (event1 event2 Subject_ID) ($);
cards;
1095A 2097B 74
1095A 3003C 74
1095A 4006D 74
2097B 3003C 12
2097B 4006D 12
3003C 4006D 116
9382J 5813K 32
8762L 3250M 66
4382E 6782F 45
4382E 5489L 45
6782F 5489L 378
;

data temp;
 set have;
 key=event1;output;
 key=event2;output;
 keep key Subject_ID;
run;
proc sort data=temp out=key nodupkey;
by Subject_ID key;
run;
proc sort data=temp(keep=Subject_ID) out=Subject_ID nodupkey;
by Subject_ID;
run;



%macro check(key=);
%local n_key n ;
proc sql noprint;
select max(n_key) into :n_key separated by ' ' from 
(
select Subject_ID,count(distinct key) as n_key
 from key 
  where key in (select key from key where Subject_ID="&key.") and Subject_ID ne "&key."
   group by Subject_ID
);
select count(distinct key) into :n separated by ' ' 
 from key 
  where Subject_ID="&key.";
quit;
%put &=key. &=n_key. &=n. ;
%if &n_key.=&n. %then %do;
  data exclude_key;
   length key $ 100;
   key="&key.";
  run;
  proc append base=want data=exclude_key;run;
%end;
%mend;


option nodsnferr nosource;
proc delete data=want;run;
data _null_;
 set Subject_ID;
 call execute(cats('%nrstr(%check(key=',Subject_ID,'))')); 
run;
proc sql;
create table final_want as
select * from have where Subject_ID not in (select key from want);
quit;
r-lee-484
Fluorite | Level 6
This is exactly the solution I am looking for. Thank you!

I wanted distinct combinations and not all possible matches per subject_id.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 634 views
  • 0 likes
  • 4 in conversation