BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ANKH1
Pyrite | Level 9

Hi, we have this raw data which reports age at interventions:

data have;
input ID AGE;
datalines;
1	0
1	0
1	3
2	0
2	9
4	3
4	6
4	7
4	10
4	10
;
run;

The variable "AGE" is the age when each ID had an intervention. We need to create a dataset that:
1) counts and accumulates the number of interventions at each age per ID.
2) accounts for all ages from the minimum to the maximum age reported by any ID. In this dataset the minimum is 0 and the maximum age is 10.
3) each ID should have a record from the minimum age to the maximum age. Even when they did not reported any intervention.
4) if an ID is missing in the intervention dataset (the main dataset contains all IDs in the study) it means the subject didn't have any intervention for the study duration. All IDs need to be accounted for.

5) variable "INTERVENTION"  0=no intervention, 1= intervention

This dataset could be the "temp" dataset. Notice that for IDs 1 and 4 they had 2 interventions at the same age.

data temp;
input ID AGE INTERVENTION COUNT_INT;
datalines;
1	0	1	2
1	0	1	.
1	1	0	2
1	2	0	2
1	3	1	3
1	4	0	3
1	5	0	3
1	6	0	3
1	7	0	3
1	8	0	3
1	9	0	3
1	10	0	3
2	0	1	1
2	1	0	1
2	2	0	1
2	3	0	1
2	4	0	1
2	5	0	1
2	6	0	1
2	7	0	1
2	8	0	1
2	9	1	2
2	10	0	2
3	1	0	0
3	2	0	0
3	3	0	0
3	4	0	0
3	5	0	0
3	6	0	0
3	7	0	0
3	8	0	0
3	9	0	0
3	10	0	0
4	0	0	0
4	1	0	0
4	2	0	0
4	3	1	1
4	4	0	1
4	5	0	1
4	6	1	2
4	7	1	3
4	8	0	3
4	9	0	3
4	10	1	5
4	10	1	.
;
run;

The final dataset should look this:

data want;
input ID AGE INTERVENTION COUNT_INT;
datalines;
1	0	1	2
1	1	0	2
1	2	0	2
1	3	1	3
1	4	0	3
1	5	0	3
1	6	0	3
1	7	0	3
1	8	0	3
1	9	0	3
1	10	0	3
2	0	1	1
2	1	0	1
2	2	0	1
2	3	0	1
2	4	0	1
2	5	0	1
2	6	0	1
2	7	0	1
2	8	0	1
2	9	1	2
2	10	0	2
3	1	0	0
3	2	0	0
3	3	0	0
3	4	0	0
3	5	0	0
3	6	0	0
3	7	0	0
3	8	0	0
3	9	0	0
3	10	0	0
4	0	0	0
4	1	0	0
4	2	0	0
4	3	1	1
4	4	0	1
4	5	0	1
4	6	1	2
4	7	1	3
4	8	0	3
4	9	0	3
4	10	1	5
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below should return what you're after.
For your real data: You should replace below data step that creates table id_age_plan with your data that contains all id's and then just create a row per age that you expect to have in your output.

data have;
  input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
5 5
6 5
6 10
6 10
;

%let start_age=0;
%let stop_age =10;
%let id_range=1 to 10;
data id_age_plan;
  do id=&id_range;
    do age=&start_age to &stop_age;
      output;
    end;
  end;
run;

data want;
  merge have(in=in_have) id_age_plan;
  by id age;

  if first.id then call missing(count_int);

  intervention_flg= in_have;
  count_int+in_have;

  if last.age then output;

run;

proc print data=want;
run;

 

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

Below should return what you're after.
For your real data: You should replace below data step that creates table id_age_plan with your data that contains all id's and then just create a row per age that you expect to have in your output.

data have;
  input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
5 5
6 5
6 10
6 10
;

%let start_age=0;
%let stop_age =10;
%let id_range=1 to 10;
data id_age_plan;
  do id=&id_range;
    do age=&start_age to &stop_age;
      output;
    end;
  end;
run;

data want;
  merge have(in=in_have) id_age_plan;
  by id age;

  if first.id then call missing(count_int);

  intervention_flg= in_have;
  count_int+in_have;

  if last.age then output;

run;

proc print data=want;
run;

 

mkeintz
PROC Star

I think a temporary array (COUNTS below) indexed by age values from 0 through 10 is a compact approach here.  Some additional code is needed to fill in data for missing ID's, but you don't have to create a template dataset.  

 

 

data have;
input ID AGE;
datalines;
1	0
1	0
1	3
2	0
2	9
4	3
4	6
4	7
4	10
4	10
;
run;

data want (drop=_:);
  array counts{0:10} _temporary_;
  set have  end=end_of_have;
  by id;

  counts{age}+1;

  if end_of_have=0 then set have (firstobs=2 keep=id rename=(id=_nxt_id));
  else _nxt_id=id+1;

  if last.id then do id=id to _nxt_id-1;
    do age=lbound(counts) to hbound(counts);
      intervention=max(counts{age},0);
      count_int=sum(count_int,intervention);
      output;
    end;
    call missing(of counts{*},intervention,count_int);
  end;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

You also write that you have a MAIN table, containing all IDs, and that you want all the IDs to be accounted for (although this is not shown in your wanted output). If your MAIN table looks like this (sorted by ID, only one record for each ID):

data main;
  do id=1 to 7;
    output;
    end;
run;

you can get the table you want like this:

%let min_age=0;
%let max_age=10;

data want;
  array interv [&min_age:&max_age] 8 _temporary_;
  call missing(of interv[*]);
  do until(last.id);
    merge have(in=have) main(keep=id);
    by id;
    if have then interv[age]+1;
    end;
  count_int=0;
  do age=&min_age to &max_age;
    intervention=interv[age]>0;
    count_int+interv[age];
    output;
    end;
run;

Note: HAVE should also be sorted by ID (but not necessarily by AGE), and all AGEs should be between &min_age and &max_age, no missing values. This can of course also be ensured by using the dataset option "where=(age between &min_age and &max_age)" on HAVE in the code above - you may have some interventions outside the range of years that you want to study.

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1173 views
  • 1 like
  • 4 in conversation