Solved: Modify variables of a data set based on an Index variable

NewUsrStat

Hi guys,

suppose to have the following dataset:

data DB;
  input ID :$20. Admission :date09. Discharge :date09. Index age age_class;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  1  59  6 
0001  22FEB2018 03MAR2018  0  60  6
0001  30JAN2019 04MAR2019  0  61  7
0002  01DEC2016 14DEC2016  0  67  7
0002  01DEC2016 14DEC2016  0  67  7
0002  25DEC2017 02JAN2018  1  68  7
0002  25FEB2018 27FEB2018  0  69  7
0003  09JAN2016 25JAN2016  0  34  4
0003  29JAN2018 12FEB2018  1  36  4
0004  02FEB2014 12MAR2014  1  76  8
0004  04APR2018 11APR2019  0  80  8
0004  02DEC2018 11SEP2019  0  80  8
;run;

Is there a way to get the following?

data DB1;
  input ID :$20. Admission :date09. Discharge :date09. Index age age_class;
  format Admission date9. Discharge date9.;
cards;
0001  13JAN2017 25JAN2017  1  59  6 
0001  22FEB2018 03MAR2018  0  59  6
0001  30JAN2019 04MAR2019  0  59  6
0002  01DEC2016 14DEC2016  0  0   0
0002  01DEC2016 14DEC2016  0  0   0
0002  25DEC2017 02JAN2018  1  68  7
0002  25FEB2018 27FEB2018  0  68  7
0003  09JAN2016 25JAN2016  0  0   0
0003  29JAN2018 12FEB2018  1  36  4
0004  02FEB2014 12MAR2014  1  76  8
0004  02FEB2014 12MAR2014  0  0   0
0004  04APR2018 11APR2019  0  76  8
0004  02DEC2018 11SEP2019  0  76  8
;run;

Rules:

1) For all cases like 0001, format age and age_class variables based on what is reported at Index = 1.

2) For all cases like 0002 do the same as for 0001 with the following exception: set to 0 the two variables (age and age_class ) if dates < dates at Index = 1. This should be done in the case dates are identical like in this case (01DEC2016 14DEC2016) or not.

3) For all cases like 0003 do the same as for 0002.

4) For all cases like 0004: do the same as for 0001 but in this case dates are replicated but in one case Index = 1 and in the other Index = 0. When Index = 0 and dates are replicated, then set the two variables, age and age_class, to 0.

Can anyone help me please?

Tom

Or is the goal to just make a subject level variables that indicates the index date and index age?

data want;
  merge db db(keep=id admission index age age_class 
              rename=(admission=idxdate index=any_index age=idx_age age_class=idx_age_class)
              where=(any_index))
  ;
  by id;
run;

If for some reason you want to clear the variables for admissions that happened before the index date then add a second step (cannot use the same step because the nature of 1 to many merges).

data want;
  set want;
  if admission < idxdate then call missing(idx_age,idx_age_class);
run;

Result

View solution in original post

Tom

I am not sure what you mean by the verb "format" in your question. In SAS FORMAT has a very specific meaning. A FORMAT is instructions for how to convert values into TEXT so they can be displayed or printed. For example in your code you are attaching the format specification DATE9. to the two variables that have date values so they will print in a style that humans can recognize as dates.

I really don't think that is the meaning of "format" that you are asking about.

Perhaps you meant you want to RECODE your variables? Or CALCULATE new variables?

NewUsrStat

Hi Tom, I have just edited the title of the post

Tom

I do not follow your rules.

What do you mean by "all cases like 0001"? What is it about that ID that you are talking about? How is ID=0001 different from ID=0004?

NewUsrStat

All cases like 0001 in the sense that the date (admission-discharge) where index = 1 or viceversa is always the first date. Conversely pts like 0002 or 0003 have dates before the date where Index = 1

Tom

So you want to compare the admission date to the INDEX DATE?

You could just re-merge the index observation back with the dataset.

data want;
  merge db db(keep=id admission index rename=(admission=idxdate index=any_index)
              where=(any_index))
  ;
  by id;
  new_age = age;
  if admission < idxdate then new_age=0;
  else new_age = age - intck('year',idxdate,admission,'c') ;
run;

Result:

Obs     ID     Admission    Discharge    Index    age    class      idxdate    index    new_age

  1    0001    13JAN2017    25JAN2017      1       59      6      13JAN2017      1         59
  2    0001    22FEB2018    03MAR2018      0       60      6      13JAN2017      1         59
  3    0001    30JAN2019    04MAR2019      0       61      7      13JAN2017      1         59
  4    0002    01DEC2016    14DEC2016      0       67      7      25DEC2017      1          0
  5    0002    01DEC2016    14DEC2016      0       67      7      25DEC2017      1          0
  6    0002    25DEC2017    02JAN2018      1       68      7      25DEC2017      1         68
  7    0002    25FEB2018    27FEB2018      0       69      7      25DEC2017      1         69
  8    0003    09JAN2016    25JAN2016      0       34      4      29JAN2018      1          0
  9    0003    29JAN2018    12FEB2018      1       36      4      29JAN2018      1         36
 10    0004    02FEB2014    12MAR2014      1       76      8      02FEB2014      1         76
 11    0004    04APR2018    11APR2019      0       80      8      02FEB2014      1         76
 12    0004    02DEC2018    11SEP2019      0       80      8      02FEB2014      1         76

Do you really want the zeros? Shouldn't the new age be MISSING for the admissions before the index date?

Tom

Or is the goal to just make a subject level variables that indicates the index date and index age?

data want;
  merge db db(keep=id admission index age age_class 
              rename=(admission=idxdate index=any_index age=idx_age age_class=idx_age_class)
              where=(any_index))
  ;
  by id;
run;

If for some reason you want to clear the variables for admissions that happened before the index date then add a second step (cannot use the same step because the nature of 1 to many merges).

data want;
  set want;
  if admission < idxdate then call missing(idx_age,idx_age_class);
run;

Result

ballardw

If by "format" you really mean "carry the values of the age and age_class variables forward to the next observations for the same ID" This may be one way:

data want;
   set have;
   by id;  /* assumes data is sorted by ID*/
   retain t_age t_ageclass;
   if first.id then call missing(t_age,t_ageclass);
   if index=1 then do;
      t_age=age;
      t_ageclass=age_class;
   end;
   age= coalesce(t_age, age);
   age_class=coalesce(t_ageclass,age_class);
   drop t_age t_ageclass;
run;

Retain keeps values across the data step boundary for those variables on a Retain statement. So we create a couple variables to hold the values for later use when appropriate.

The BY id allows us to reset the retained values to missing so we don't get them from the previous id. Conditionally assign the retained variable values when Index is 1.

The Coalesce function returns the first non-missing value in the list. So if the t_age is missing it returns age, other wise the t_age variable.

If your data is not actually sorted by ID but is grouped you can add the NOTSORTED option to the BY statement so SAS will execute the code. Otherwise there would be an error created by not sorted data.

Note: Your age_class variable is likely not needed as you could use a an actual format based on the value of AGE to display those categories or group values for analysis or reporting.

Modify variables of a data set based on an Index variable

Re: Format variables of a data set based on an Index variable

Re: Format variables of a data set based on an Index variable

Re: Format variables of a data set based on an Index variable

Re: Format variables of a data set based on an Index variable

Re: Format variables of a data set based on an Index variable

Re: Format variables of a data set based on an Index variable

Re: Format variables of a data set based on an Index variable

Re: Modify variables of a data set based on an Index variable

SAS Innovate 2025: Call for Content