Hi, we have this raw data which reports age at interventions:
data have;
input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
;
run;
The variable "AGE" is the age when each ID had an intervention. We need to create a dataset that:
1) counts and accumulates the number of interventions at each age per ID.
2) accounts for all ages from the minimum to the maximum age reported by any ID. In this dataset the minimum is 0 and the maximum age is 10.
3) each ID should have a record from the minimum age to the maximum age. Even when they did not reported any intervention.
4) if an ID is missing in the intervention dataset (the main dataset contains all IDs in the study) it means the subject didn't have any intervention for the study duration. All IDs need to be accounted for.
5) variable "INTERVENTION" 0=no intervention, 1= intervention
This dataset could be the "temp" dataset. Notice that for IDs 1 and 4 they had 2 interventions at the same age.
data temp;
input ID AGE INTERVENTION COUNT_INT;
datalines;
1 0 1 2
1 0 1 .
1 1 0 2
1 2 0 2
1 3 1 3
1 4 0 3
1 5 0 3
1 6 0 3
1 7 0 3
1 8 0 3
1 9 0 3
1 10 0 3
2 0 1 1
2 1 0 1
2 2 0 1
2 3 0 1
2 4 0 1
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 2
2 10 0 2
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 0 0
3 8 0 0
3 9 0 0
3 10 0 0
4 0 0 0
4 1 0 0
4 2 0 0
4 3 1 1
4 4 0 1
4 5 0 1
4 6 1 2
4 7 1 3
4 8 0 3
4 9 0 3
4 10 1 5
4 10 1 .
;
run;
The final dataset should look this:
data want;
input ID AGE INTERVENTION COUNT_INT;
datalines;
1 0 1 2
1 1 0 2
1 2 0 2
1 3 1 3
1 4 0 3
1 5 0 3
1 6 0 3
1 7 0 3
1 8 0 3
1 9 0 3
1 10 0 3
2 0 1 1
2 1 0 1
2 2 0 1
2 3 0 1
2 4 0 1
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 2
2 10 0 2
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 0 0
3 8 0 0
3 9 0 0
3 10 0 0
4 0 0 0
4 1 0 0
4 2 0 0
4 3 1 1
4 4 0 1
4 5 0 1
4 6 1 2
4 7 1 3
4 8 0 3
4 9 0 3
4 10 1 5
;
run;
Below should return what you're after.
For your real data: You should replace below data step that creates table id_age_plan with your data that contains all id's and then just create a row per age that you expect to have in your output.
data have;
input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
5 5
6 5
6 10
6 10
;
%let start_age=0;
%let stop_age =10;
%let id_range=1 to 10;
data id_age_plan;
do id=&id_range;
do age=&start_age to &stop_age;
output;
end;
end;
run;
data want;
merge have(in=in_have) id_age_plan;
by id age;
if first.id then call missing(count_int);
intervention_flg= in_have;
count_int+in_have;
if last.age then output;
run;
proc print data=want;
run;
Below should return what you're after.
For your real data: You should replace below data step that creates table id_age_plan with your data that contains all id's and then just create a row per age that you expect to have in your output.
data have;
input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
5 5
6 5
6 10
6 10
;
%let start_age=0;
%let stop_age =10;
%let id_range=1 to 10;
data id_age_plan;
do id=&id_range;
do age=&start_age to &stop_age;
output;
end;
end;
run;
data want;
merge have(in=in_have) id_age_plan;
by id age;
if first.id then call missing(count_int);
intervention_flg= in_have;
count_int+in_have;
if last.age then output;
run;
proc print data=want;
run;
Thank you!
I think a temporary array (COUNTS below) indexed by age values from 0 through 10 is a compact approach here. Some additional code is needed to fill in data for missing ID's, but you don't have to create a template dataset.
data have;
input ID AGE;
datalines;
1 0
1 0
1 3
2 0
2 9
4 3
4 6
4 7
4 10
4 10
;
run;
data want (drop=_:);
array counts{0:10} _temporary_;
set have end=end_of_have;
by id;
counts{age}+1;
if end_of_have=0 then set have (firstobs=2 keep=id rename=(id=_nxt_id));
else _nxt_id=id+1;
if last.id then do id=id to _nxt_id-1;
do age=lbound(counts) to hbound(counts);
intervention=max(counts{age},0);
count_int=sum(count_int,intervention);
output;
end;
call missing(of counts{*},intervention,count_int);
end;
run;
You also write that you have a MAIN table, containing all IDs, and that you want all the IDs to be accounted for (although this is not shown in your wanted output). If your MAIN table looks like this (sorted by ID, only one record for each ID):
data main;
do id=1 to 7;
output;
end;
run;
you can get the table you want like this:
%let min_age=0;
%let max_age=10;
data want;
array interv [&min_age:&max_age] 8 _temporary_;
call missing(of interv[*]);
do until(last.id);
merge have(in=have) main(keep=id);
by id;
if have then interv[age]+1;
end;
count_int=0;
do age=&min_age to &max_age;
intervention=interv[age]>0;
count_int+interv[age];
output;
end;
run;
Note: HAVE should also be sorted by ID (but not necessarily by AGE), and all AGEs should be between &min_age and &max_age, no missing values. This can of course also be ensured by using the dataset option "where=(age between &min_age and &max_age)" on HAVE in the code above - you may have some interventions outside the range of years that you want to study.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.