Hello.
Suppose I have a data set that looks something like
ID age_0 age_1 age_2 age_3 age_4 age_5 age_6 age_7...... age_18
1 1 1 1 1 0 0 0 0 ........0
2 0 0 0 1 1 1 0 0 ........0
3 1 1 0 1 1 1 1 0 ........0
.
.
.
14,000 0 0 0 0 0 0 1 1 ........1
Where age_0 through age_18 represents binary variables which record whether or not a person had a particular disease recur at each age.
(1) Suppose we want to know the age onset of the disease. So for ID #1 it would be at age_0, for ID#2 it would be age_3, for ID#3 it would be age_0, for ID #14,000 it would be age_6.
(2) Suppose we want to know the last age at which they had the disease. For ID #1 it would be age_3, for ID #2 it would be age_5 etc.
(3) Suppose we want to know the duration of the disease. This could have two meanings:
(a) It could be the # of groups of consecutive sequence of 1's (so for ID#1 it is 1, for ID #2 it is 1, for ID # 3 it is 2, for ID #14,000 it is 2).
(b) If we are referring to the # of years they had the disease, it would just be a sum of the number of 1's for each person (for ID # 1 it is 4, for ID#2 it is 3, for ID#3 it is6, for ID#14,000 it is 12).
What would be the most efficient way to code this in SAS? I do not know how to use macros.. there has to be an easier way to do this I can find the total number of years they had the illness by creating a summary score of the 1's. But I am not sure how to do the other 3 tasks. Help? I wanted to google this but I'm not even sure what I would type to look this up.
Thanks
The keywords for a Google search could be "SAS array processing". You're right: no macro coding is needed.
You need so many new field .
data x; input ID age_0 age_1 age_2 age_3 age_4 age_5 age_6 age_7 ; cards; 1 1 1 1 1 0 0 0 0 2 0 0 0 1 1 1 0 0 3 1 1 0 1 1 1 1 0 ; run; data want; set x; length onset last_age $ 40; num_of_groups=0; array _a{*} age_: ; do i=1 to dim(_a); if _a{i}=1 then last_age=vname(_a{i}); end; do i=dim(_a) to 1 by -1; if _a{i}=1 then onset=vname(_a{i}); end; num_of_groups=countc(strip(compbl(translate(cats('*',of _a{*},'*'),' ','1'))),' ') ; num_of_years=sum(of _a{*}); drop i; run;
Ksharp
Modifying @Ksharp's code slightly, you could calculate all the required variables within a single loop of the array.
data want;
set x;
length onset last_age $ 40;
num_of_groups=0;
num_of_years=0;
array _a{*} age_: ;
do i=1 to dim(_a);
num_of_years+_a{i};
if _a{i}=1 then do;
if num_of_years=1 then onset=vname(_a{i});
last_age=vname(_a{i});
if i=1 or _a{i-1}=0 then num_of_groups+1;
end;
end;
drop i;
run;
data have;
input ID:$ age_0 age_1 age_2 age_3 age_4 age_5 age_6 age_7;
cards;
1 1 1 1 1 0 0 0 0
2 0 0 0 1 1 1 0 0
3 1 1 0 1 1 1 1 0
;
data want;
set have;
array age age_:;
_si=dim(age);
_ei=1;
do _i=1 to dim(age);
if age(_i)=1 then do;
_year=vname(age(_i));
if _i<=_si then do;onset=_year;_si=min(_si,_i);end;
if _i>=_ei then do; last=_year;_ei=max(_ei,_i);end;
end;
end;
_cat=cats(of age(*));
put _cat=;
Duration_group=countw(translate(_cat,' ','0'));
Duration_year=countc(_cat,'1');
drop _:;
run;
proc print;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.