Okay, this is a fairly long one. I appreciate your help in advance. I usually have no problems with retain, but this is the first time I've tried it with more than one by-variable.
I have a dataset of incidents, and am trying to identify repeated incidents among individuals. I have several variables which are used to identify unique individuals, so I need to retain information across these variables (uniqueid, ptdob, sex) in order to calculate the variable diff (which is the time difference between incidents). I create a variable base which is the first incident, and casenum is the counter for all incidents for that person.
data new;
set old;
format base mmddyy8.;
by uniqueid ptdob sex admitdte;
retain base old_uniqueid old_ptdob old_sex casenum;
if (first.uniqueid=1 and first.ptdob=1 and first.admitdte=1 and first.sex=1) then do;
old_uniqueid=uniqueid;
old_ptdob=ptdob;
old_sex=sex;
base=admitdte;
casenum=0;
end;
if (uniqueid=old_uniqueid and ptdob=old_ptdob and sex=old_sex) then do;
diff=admitdte-base;
base=admitdte;
casenum=casenum+1;
end;
run;
proc print;
var uniqueid ptdob sex old_uniqueid ptdob old_sex base admitdte diff casenum;
run;
Some of the printed output:
Obs UNIQUEID PTDOB SEX old_uniqueid old_ptdob old_sex base ADMITDTE diff casenum
Diff and casenum okay here, where there are multiple incidents for one individual:
16 000CDC785C4353DB15AC 19531105 2 000CDC785C4353DB15AC 19531105 2 02/04/05 02/04/05 0 1
17 000CDC785C4353DB15AC 19531105 2 000CDC785C4353DB15AC 19531105 2 03/02/06 03/02/06 391 2
18 000CDC785C4353DB15AC 19531105 2 000CDC785C4353DB15AC 19531105 2 04/02/06 04/02/06 31 3
19 000CDC785C4353DB15AC 19531105 2 000CDC785C4353DB15AC 19531105 2 07/21/06 07/21/06 110 4
And here, where there is only one incident per person (3 different people here)
22 00122E6ADA905AB34ABC 19760807 2 00122E6ADA905AB34ABC 19760807 2 09/18/05 09/18/05 0 1
23 0013ADECDDB3712DB941 19660205 2 0013ADECDDB3712DB941 19660205 2 08/10/06 08/10/06 0 1
24 0013C9B92BCCEE96FED7 19301005 2 0013C9B92BCCEE96FED7 19301005 2 01/22/06 01/22/06 0 1
But not here, where individuals share one of the 4 variables (uniqueid) in the by-variable but not others...
74 0044634429982447D498 19750606 1 0044634429982447D498 19700128 1 09/25/06 05/13/05 . 1
75 0044634429982447D498 19970326 2 0044634429982447D498 19700128 1 09/25/06 02/22/05 . 1
76 0044634429982447D498 19990117 2 0044634429982447D498 19700128 1 09/25/06 11/18/05 . 1
77 0044634429982447D498 19990117 2 0044634429982447D498 19700128 1 09/25/06 12/18/06 . 1
78 0044634429982447D498 20010523 2 0044634429982447D498 19700128 1 09/25/06 09/05/06 . 1
79 0044634429982447D498 20020319 1 0044634429982447D498 19700128 1 09/25/06 02/14/05 . 1
I think I understand why diff would be . in the first iteration - since it is calculated from the retained value of base. But since they are different people, I don't get why the processing is ocurring as if they were the same person.
My problem is that I need to cover ALL of the following scenarios:
1) the first instance of the identifier (first.uniqueid, first.ptdob, first.admitdte, first.sex). Taken care of with the code above.
2) if first and last instance of all 4 identifying variables (i.e., only one incident per person). Also taken care of above, could add in some specific code if needed like:
if (first.uniqueid and first.ptdob and first.admitdte and first.sex and last.uniqueid and last.ptdob and last.admitdte and last.sex) then do;
3) if first instance of only one of the identifying variables, but not the first instance of the others >>e.g., if first.uniqueid=1 and first.ptdob=0, etc. ???
Scenario 3 is my problem. I've tried all sorts of combinations of first. and last. processing with 1 and 0, and am still not getting the right values for diff and casenum.
Thanks!
RJ
if (first.uniqueid and first.ptdob and first.admitdte and first.sex and
last.uniqueid and last.ptdob and last.admitdte and last.sex) then do;
Without sample data it is difficult to tell exactly what you are trying to do. However, the way you are using first, you will only meet all of the conditions on an individual's initial record. Without trying to parse your logic, I will guess that you only need first.admitdte.
Art
Given your BY statement:
by uniqueid ptdob sex admitdte;
1) the first instance of the identifier (first.uniqueid, first.ptdob, first.admitdte, first.sex). Taken care of with the code above.
Just use FIRST.ADMITDTE. You do not need to look at the other first. variables.
2) if first and last instance of all 4 identifying variables (i.e., only one incident per person). Also taken care of above, could add in some specific code if needed like:
if (first.uniqueid and first.ptdob and first.admitdte and first.sex and last.uniqueid and last.ptdob and last.admitdte and last.sex) then do;
Only one record per id would be tested by: (first.admitdte and last.admitdte)
3) if first instance of only one of the identifying variables, but not the first instance of the others >>e.g., if first.uniqueid=1 and first.ptdob=0, etc. ???
Scenario 3 is my problem. I've tried all sorts of combinations of first. and last. processing with 1 and 0, and am still not getting the right values for diff and casenum.
If you want the first instance of the combination UNIQUEID,PTDOB and SEX then use FIRST.SEX.
Again you do not need to look at all of them.
Ah..it worked. It seems so obvious to me now. Thanks, Tom.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.