Hello.
Suppose I have a data set that looks something like
ID age_0 age_1 age_2 age_3 age_4 age_5 age_6 age_7...... age_18
1 1 1 1 1 0 0 0 0 ........0
2 0 0 0 1 1 1 0 0 ........0
3 1 1 0 1 1 1 1 0 ........0
.
.
.
14,000 0 0 0 0 0 0 1 1 ........1
Where age_0 through age_18 represents binary variables which record whether or not a person had a particular disease recur at each age.
(1) Suppose we want to know the age onset of the disease. So for ID #1 it would be at age_0, for ID#2 it would be age_3, for ID#3 it would be age_0, for ID #14,000 it would be age_6.
(2) Suppose we want to know the last age at which they had the disease. For ID #1 it would be age_3, for ID #2 it would be age_5 etc.
(3) Suppose we want to know the duration of the disease. This could have two meanings:
(a) It could be the # of groups of consecutive sequence of 1's (so for ID#1 it is 1, for ID #2 it is 1, for ID # 3 it is 2, for ID #14,000 it is 2).
(b) If we are referring to the # of years they had the disease, it would just be a sum of the number of 1's for each person (for ID # 1 it is 4, for ID#2 it is 3, for ID#3 it is6, for ID#14,000 it is 12).
What would be the most efficient way to code this in SAS? I do not know how to use macros.. there has to be an easier way to do this I can find the total number of years they had the illness by creating a summary score of the 1's. But I am not sure how to do the other 3 tasks. Help? I wanted to google this but I'm not even sure what I would type to look this up.
Thanks
The keywords for a Google search could be "SAS array processing". You're right: no macro coding is needed.
You need so many new field .
data x; input ID age_0 age_1 age_2 age_3 age_4 age_5 age_6 age_7 ; cards; 1 1 1 1 1 0 0 0 0 2 0 0 0 1 1 1 0 0 3 1 1 0 1 1 1 1 0 ; run; data want; set x; length onset last_age $ 40; num_of_groups=0; array _a{*} age_: ; do i=1 to dim(_a); if _a{i}=1 then last_age=vname(_a{i}); end; do i=dim(_a) to 1 by -1; if _a{i}=1 then onset=vname(_a{i}); end; num_of_groups=countc(strip(compbl(translate(cats('*',of _a{*},'*'),' ','1'))),' ') ; num_of_years=sum(of _a{*}); drop i; run;
Ksharp
Modifying @Ksharp's code slightly, you could calculate all the required variables within a single loop of the array.
data want;
set x;
length onset last_age $ 40;
num_of_groups=0;
num_of_years=0;
array _a{*} age_: ;
do i=1 to dim(_a);
num_of_years+_a{i};
if _a{i}=1 then do;
if num_of_years=1 then onset=vname(_a{i});
last_age=vname(_a{i});
if i=1 or _a{i-1}=0 then num_of_groups+1;
end;
end;
drop i;
run;
data have;
input ID:$ age_0 age_1 age_2 age_3 age_4 age_5 age_6 age_7;
cards;
1 1 1 1 1 0 0 0 0
2 0 0 0 1 1 1 0 0
3 1 1 0 1 1 1 1 0
;
data want;
set have;
array age age_:;
_si=dim(age);
_ei=1;
do _i=1 to dim(age);
if age(_i)=1 then do;
_year=vname(age(_i));
if _i<=_si then do;onset=_year;_si=min(_si,_i);end;
if _i>=_ei then do; last=_year;_ei=max(_ei,_i);end;
end;
end;
_cat=cats(of age(*));
put _cat=;
Duration_group=countw(translate(_cat,' ','0'));
Duration_year=countc(_cat,'1');
drop _:;
run;
proc print;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.