Solved: Create variable that counts, accumulates and displays events at differ...

ANKH1 · Posted 12-27-2023 02:22 PM

Hi, we have this raw data which reports age at interventions:

data have;
input ID AGE;
datalines;
1	0
1	0
1	3
2	0
2	9
4	3
4	6
4	7
4	10
4	10
;
run;

The variable "AGE" is the age when each ID had an intervention. We need to create a dataset that:
1) counts and accumulates the number of interventions at each age per ID.
2) accounts for all ages from the minimum to the maximum age reported by any ID. In this dataset the minimum is 0 and the maximum age is 10.
3) each ID should have a record from the minimum age to the maximum age. Even when they did not reported any intervention.
4) if an ID is missing in the intervention dataset (the main dataset contains all IDs in the study) it means the subject didn't have any intervention for the study duration. All IDs need to be accounted for.

5) variable "INTERVENTION" 0=no intervention, 1= intervention

This dataset could be the "temp" dataset. Notice that for IDs 1 and 4 they had 2 interventions at the same age.

data temp;
input ID AGE INTERVENTION COUNT_INT;
datalines;
1	0	1	2
1	0	1	.
1	1	0	2
1	2	0	2
1	3	1	3
1	4	0	3
1	5	0	3
1	6	0	3
1	7	0	3
1	8	0	3
1	9	0	3
1	10	0	3
2	0	1	1
2	1	0	1
2	2	0	1
2	3	0	1
2	4	0	1
2	5	0	1
2	6	0	1
2	7	0	1
2	8	0	1
2	9	1	2
2	10	0	2
3	1	0	0
3	2	0	0
3	3	0	0
3	4	0	0
3	5	0	0
3	6	0	0
3	7	0	0
3	8	0	0
3	9	0	0
3	10	0	0
4	0	0	0
4	1	0	0
4	2	0	0
4	3	1	1
4	4	0	1
4	5	0	1
4	6	1	2
4	7	1	3
4	8	0	3
4	9	0	3
4	10	1	5
4	10	1	.
;
run;

The final dataset should look this:

data want;
input ID AGE INTERVENTION COUNT_INT;
datalines;
1	0	1	2
1	1	0	2
1	2	0	2
1	3	1	3
1	4	0	3
1	5	0	3
1	6	0	3
1	7	0	3
1	8	0	3
1	9	0	3
1	10	0	3
2	0	1	1
2	1	0	1
2	2	0	1
2	3	0	1
2	4	0	1
2	5	0	1
2	6	0	1
2	7	0	1
2	8	0	1
2	9	1	2
2	10	0	2
3	1	0	0
3	2	0	0
3	3	0	0
3	4	0	0
3	5	0	0
3	6	0	0
3	7	0	0
3	8	0	0
3	9	0	0
3	10	0	0
4	0	0	0
4	1	0	0
4	2	0	0
4	3	1	1
4	4	0	1
4	5	0	1
4	6	1	2
4	7	1	3
4	8	0	3
4	9	0	3
4	10	1	5
;
run;

Patrick · Posted 12-27-2023 09:34 PM

Below should return what you're after.
For your real data: You should replace below data step that creates table id_age_plan with your data that contains all id's and then just create a row per age that you expect to have in your output.

data have;
  input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
5 5
6 5
6 10
6 10
;

%let start_age=0;
%let stop_age =10;
%let id_range=1 to 10;
data id_age_plan;
  do id=&id_range;
    do age=&start_age to &stop_age;
      output;
    end;
  end;
run;

data want;
  merge have(in=in_have) id_age_plan;
  by id age;

  if first.id then call missing(count_int);

  intervention_flg= in_have;
  count_int+in_have;

  if last.age then output;

run;

proc print data=want;
run;

View solution in original post

Patrick · Posted 12-27-2023 09:34 PM

Below should return what you're after.
For your real data: You should replace below data step that creates table id_age_plan with your data that contains all id's and then just create a row per age that you expect to have in your output.

data have;
  input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
5 5
6 5
6 10
6 10
;

%let start_age=0;
%let stop_age =10;
%let id_range=1 to 10;
data id_age_plan;
  do id=&id_range;
    do age=&start_age to &stop_age;
      output;
    end;
  end;
run;

data want;
  merge have(in=in_have) id_age_plan;
  by id age;

  if first.id then call missing(count_int);

  intervention_flg= in_have;
  count_int+in_have;

  if last.age then output;

run;

proc print data=want;
run;

ANKH1 · Posted 12-28-2023 02:14 PM

Thank you!

mkeintz · Posted 12-27-2023 10:28 PM

I think a temporary array (COUNTS below) indexed by age values from 0 through 10 is a compact approach here. Some additional code is needed to fill in data for missing ID's, but you don't have to create a template dataset.

data have;
input ID AGE;
datalines;
1	0
1	0
1	3
2	0
2	9
4	3
4	6
4	7
4	10
4	10
;
run;

data want (drop=_:);
  array counts{0:10} _temporary_;
  set have  end=end_of_have;
  by id;

  counts{age}+1;

  if end_of_have=0 then set have (firstobs=2 keep=id rename=(id=_nxt_id));
  else _nxt_id=id+1;

  if last.id then do id=id to _nxt_id-1;
    do age=lbound(counts) to hbound(counts);
      intervention=max(counts{age},0);
      count_int=sum(count_int,intervention);
      output;
    end;
    call missing(of counts{*},intervention,count_int);
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ANKH1 · Posted 12-28-2023 02:14 PM

Thank you!

s_lassen · Posted 12-28-2023 05:00 AM

You also write that you have a MAIN table, containing all IDs, and that you want all the IDs to be accounted for (although this is not shown in your wanted output). If your MAIN table looks like this (sorted by ID, only one record for each ID):

data main;
  do id=1 to 7;
    output;
    end;
run;

you can get the table you want like this:

%let min_age=0;
%let max_age=10;

data want;
  array interv [&min_age:&max_age] 8 _temporary_;
  call missing(of interv[*]);
  do until(last.id);
    merge have(in=have) main(keep=id);
    by id;
    if have then interv[age]+1;
    end;
  count_int=0;
  do age=&min_age to &max_age;
    intervention=interv[age]>0;
    count_int+interv[age];
    output;
    end;
run;

Note: HAVE should also be sorted by ID (but not necessarily by AGE), and all AGEs should be between &min_age and &max_age, no missing values. This can of course also be ensured by using the dataset option "where=(age between &min_age and &max_age)" on HAVE in the code above - you may have some interventions outside the range of years that you want to study.

Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Re: Create variable that counts, accumulates and displays events at different ages

Registration is open

SAS Training: Just a Click Away