DATA Step, Macro, Functions and more

Retaining with multiple by variables?

Reply
New Contributor
Posts: 2

Retaining with multiple by variables?

Okay, this is a fairly long one. I appreciate your help in advance. I usually have no problems with retain, but this is the first time I've tried it with more than one by-variable. 

I have a dataset of incidents, and am trying to identify repeated incidents among individuals. I have several variables which are used to identify unique individuals, so I need to retain information across these variables (uniqueid, ptdob, sex) in order to calculate the variable diff (which is the time difference between incidents). I create a variable base which is the first incident, and casenum is the counter for all incidents for that person.

data new;

set old;

format base mmddyy8.;

by uniqueid ptdob sex admitdte;

retain base old_uniqueid old_ptdob old_sex casenum;

if (first.uniqueid=1 and first.ptdob=1 and first.admitdte=1 and first.sex=1) then do;

        old_uniqueid=uniqueid;

        old_ptdob=ptdob;

        old_sex=sex;

        base=admitdte;

        casenum=0;

    end;

if (uniqueid=old_uniqueid and ptdob=old_ptdob and sex=old_sex) then do;

        diff=admitdte-base;

        base=admitdte;

        casenum=casenum+1;

    end;

run;

proc print;

var uniqueid ptdob sex old_uniqueid ptdob old_sex base admitdte diff casenum;

run;

Some of the printed output:

   Obs  UNIQUEID              PTDOB             SEX      old_uniqueid                         old_ptdob    old_sex  base  ADMITDTE  diff  casenum

Diff and casenum okay here, where there are multiple incidents for one individual:

    16  000CDC785C4353DB15AC  19531105           2   000CDC785C4353DB15AC  19531105     2     02/04/05  02/04/05     0      1

    17  000CDC785C4353DB15AC  19531105           2   000CDC785C4353DB15AC  19531105     2     03/02/06  03/02/06   391     2

    18  000CDC785C4353DB15AC  19531105           2   000CDC785C4353DB15AC  19531105     2     04/02/06  04/02/06    31      3

    19  000CDC785C4353DB15AC  19531105           2   000CDC785C4353DB15AC  19531105     2     07/21/06  07/21/06   110     4

And here, where there is only one incident per person (3 different people here)

    22  00122E6ADA905AB34ABC  19760807           2   00122E6ADA905AB34ABC  19760807     2     09/18/05  09/18/05     0     1

    23  0013ADECDDB3712DB941  19660205           2   0013ADECDDB3712DB941  19660205     2     08/10/06  08/10/06     0     1

    24  0013C9B92BCCEE96FED7  19301005           2   0013C9B92BCCEE96FED7  19301005     2     01/22/06  01/22/06     0     1

But not here, where individuals share one of the 4 variables (uniqueid) in the by-variable but not others...

    74  0044634429982447D498  19750606           1   0044634429982447D498  19700128     1     09/25/06  05/13/05     .     1

    75  0044634429982447D498  19970326           2   0044634429982447D498  19700128     1     09/25/06  02/22/05     .     1

    76  0044634429982447D498  19990117           2   0044634429982447D498  19700128     1     09/25/06  11/18/05     .     1

    77  0044634429982447D498  19990117           2   0044634429982447D498  19700128     1     09/25/06  12/18/06     .     1

    78  0044634429982447D498  20010523           2   0044634429982447D498  19700128     1     09/25/06  09/05/06     .     1

    79  0044634429982447D498  20020319           1   0044634429982447D498  19700128     1     09/25/06  02/14/05     .     1


I think I understand why diff would be . in the first iteration - since it is calculated from the retained value of base. But since they are different people, I don't get why the processing is ocurring as if they were the same person.

My problem is that I need to cover ALL of the following scenarios:

1) the first instance of the identifier (first.uniqueid, first.ptdob, first.admitdte, first.sex).  Taken care of with the code above.

2) if first and last instance of all 4 identifying variables (i.e., only one incident per person). Also taken care of above, could add in some specific code if needed like:

     if (first.uniqueid and first.ptdob and first.admitdte and first.sex and last.uniqueid and last.ptdob and last.admitdte and last.sex) then do;

3) if first instance of only one of the identifying variables, but not the first instance of the others >>e.g., if first.uniqueid=1 and first.ptdob=0, etc.  ???

Scenario 3 is my problem. I've tried all sorts of combinations of first. and last. processing with 1 and 0, and am still not getting the right values for diff and casenum.

Thanks!

RJ

     if (first.uniqueid and first.ptdob and first.admitdte and first.sex and

         last.uniqueid and last.ptdob and last.admitdte and last.sex) then do;

PROC Star
Posts: 7,467

Retaining with multiple by variables?

Posted in reply to runbean009

Without sample data it is difficult to tell exactly what you are trying to do.  However, the way you are using first, you will only meet all of the conditions on an individual's initial record.  Without trying to parse your logic, I will guess that you only need first.admitdte.

Art

Super User
Super User
Posts: 7,038

Retaining with multiple by variables?

Posted in reply to runbean009

Given your BY statement:

by uniqueid ptdob sex admitdte;

1) the first instance of the identifier (first.uniqueid, first.ptdob, first.admitdte, first.sex).  Taken care of with the code above.

Just use FIRST.ADMITDTE. You do not need to look at the other first. variables.

2) if first and last instance of all 4 identifying variables (i.e., only one incident per person). Also taken care of above, could add in some specific code if needed like:

     if (first.uniqueid and first.ptdob and first.admitdte and first.sex and last.uniqueid and last.ptdob and last.admitdte and last.sex) then do;

Only one record per id would be tested by: (first.admitdte and last.admitdte)

3) if first instance of only one of the identifying variables, but not the first instance of the others >>e.g., if first.uniqueid=1 and first.ptdob=0, etc.  ???

Scenario 3 is my problem. I've tried all sorts of combinations of first. and last. processing with 1 and 0, and am still not getting the right values for diff and casenum.

If you want the first instance of the combination UNIQUEID,PTDOB and SEX then use FIRST.SEX.

Again you do not need to look at all of them.

New Contributor
Posts: 2

Retaining with multiple by variables?

Ah..it worked. It seems so obvious to me now. Thanks, Tom.

Ask a Question
Discussion stats
  • 3 replies
  • 168 views
  • 0 likes
  • 3 in conversation