BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SanKH1
Quartz | Level 8

Hi, there is the following dataset where each ageX column indicates if there was an incident or not.

data dsin;
input ID age1 age2 age3 age4 age5 age6;
datalines;
1 1 0 0 1 0 0
2 0 1 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 1 0 0 0 1 0
6 0 0 1 0 0 0
;
run;

We would like to have the cumulative number of people who had an incident by age. The tricky part is that once a person has an incident and it is accounted for, if this person has incidents after that, it still counts only as 1. Let's say for ID1:it presents an incident at age1, and age4 as well. But this person can only be accounted once. The below table is the output we are looking for:

Ageage1age2age3age4age5age6
Cumulative234555
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
data dsin;
input ID age1 age2 age3 age4 age5 age6;
datalines;
1 1 0 0 1 0 0
2 0 1 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 1 0 0 0 1 0
6 0 0 1 0 0 0
;
run;

data _dsin / view=_dsin;
set dsin;
array age age1-age6;
flag=0;
do i=1 to dim(age);
if age(i) >= 1 and flag=0 then do; age(i)=1; flag=1; end
else if age(i) >= 1 and flag=1 then age(i) = 0;
end;
keep id age:;
run;

proc means data=_dsin noprint;
output out=agg1 sum=;
run;

data want;
set agg1;
array age age1-age6;

do i=2 to dim(age);
age(i) = age(i) + age(i-1);
end;
keep age:;
run;

That modification should work 🙂

View solution in original post

8 REPLIES 8
Reeza
Super User

Should be a faster way than this, but I'm sleep deprived today.

 

data dsin;
input ID age1 age2 age3 age4 age5 age6;
datalines;
1 1 0 0 1 0 0
2 0 1 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 1 0 0 0 1 0
6 0 0 1 0 0 0
;
run;

data _dsin / view=_dsin;
set dsin;
array age age1-age6;
flag=0;
do i=1 to dim(age);
if age(i) = 1 and flag=0 then flag=1;
else if age(i) = 1 and flag=1 then age(i) = 0;
end;
keep id age:;
run;

proc means data=_dsin noprint;
output out=agg1 sum=;
run;

data want;
set agg1;
array age age1-age6;

do i=2 to dim(age);
age(i) = age(i) + age(i-1);
end;
keep age:;
run;

SanKH1
Quartz | Level 8

Hi! Thank you so much for your response. It worked perfectly. But I encountered that some of the datasets will have more than 1 incident for the same ID in the same age variable. Is it possible to account for this in the code? That is, the ID will only be accounted once, even though it reported 2 or more incidents for the same ID. 

Reeza
Super User
That would imply a different structure than above as you have only 1 variable per age, or that it's not just 0/1?
SanKH1
Quartz | Level 8
Yes, these numbers 0,1,2 come from a frequency table.
Reeza
Super User
data dsin;
input ID age1 age2 age3 age4 age5 age6;
datalines;
1 1 0 0 1 0 0
2 0 1 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 1 0 0 0 1 0
6 0 0 1 0 0 0
;
run;

data _dsin / view=_dsin;
set dsin;
array age age1-age6;
flag=0;
do i=1 to dim(age);
if age(i) >= 1 and flag=0 then do; age(i)=1; flag=1; end
else if age(i) >= 1 and flag=1 then age(i) = 0;
end;
keep id age:;
run;

proc means data=_dsin noprint;
output out=agg1 sum=;
run;

data want;
set agg1;
array age age1-age6;

do i=2 to dim(age);
age(i) = age(i) + age(i-1);
end;
keep age:;
run;

That modification should work 🙂

Ksharp
Super User
data dsin;
input ID age1 age2 age3 age4 age5 age6;
datalines;
1 1 0 0 1 0 0
2 0 1 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 1 0 0 0 1 0
6 0 0 1 0 0 0
;
run;

proc sql;
create table want as
select 'Cumulative' as Age ,
 sum(age1) as age1,
 sum(max(age1,age2)) as age2,
 sum(max(age1,age2,age3)) as age3,
 sum(max(age1,age2,age3,age4)) as age4,
 sum(max(age1,age2,age3,age4,age5)) as age5,
 sum(max(age1,age2,age3,age4,age5,age6)) as age6
 from dsin ;
quit;
mkeintz
Jade | Level 19
data dsin;
input ID age1 age2 age3 age4 age5 age6;
datalines;
1 1 0 0 1 0 0
2 0 1 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 1 0 0 0 1 0
6 0 0 1 0 0 0
run;

data want (keep=cumcount:);
  set dsin end=end_of_data;
  array age {6};
  array cumcount {6} (6*0);

  if whichn(1,of age{*})>0 then do _n_=whichn(1,of age{*}) to dim(age);
    cumcount{_n_}+1;
  end;

  if end_of_data;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

It probably would be simpler not to have that wide structure to start with.

data have;
  input ID @;
  do age=1 to 6;
     input incident  @;
     Cumulative=max(incident ,Cumulative);
     output;
  end;
datalines;
1 1 0 0 1 0 0
2 0 1 1 0 0 0
3 0 0 0 1 0 0
4 0 0 0 0 0 0
5 1 0 0 0 1 0
6 0 0 1 0 0 0
;

Then getting the sum is easy.

proc means nway sum;
  class age;
  var cumulative;
run;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 251 views
  • 0 likes
  • 5 in conversation