id date arf
1 01/01/2005 1
1 01/01/2005 1
2 01/01/2005 1
2 02/02/2005 0
2 03/02/2005 1
Hi I am trying to create a variable conditional on the following:
if arf=1 in two different date then arf2= 1
if arf=1 in the same date arf2=0
for example for the above example, here is the output:
id arf2
1 0
2 1
data have; input id date : mmddyy10. arf ; format date mmddyy10.; cards; 1 01/01/2005 1 1 01/01/2005 1 2 01/01/2005 1 2 02/02/2005 0 2 03/02/2005 1 ; run; proc sql; select id,case when(count(distinct date) ne 1) then 1 else 0 end as arf2 from have where arf=1 group by id; quit;
You may try next code:
proc sort data=have; by id date; run; /* if data is already sorted skip this line */
data want;
set have;
by id date (where=(arf=1));
retain counter;
if first.id then counter =0;
if first.date then counter +1;
if last.id and counter GE 1 then arf2=1;
else arf2=0;
keep id arf2;
run;
My proposed solution is a two step process. The first part requires creating a temporary data set, work.tempdsn, containing a flag variable that sums the arf column by each id/date combination.
proc sql;
create table work.tempdsn as
select distinct id
, date
, sum(arf) as flag
from work.dsn
group by id, date;
The idea being that any distinct id/date row containing a value for flag >=2 has arf=1 for the same date, so arf2 needs to be set equal to 0 for that id. Below are the results from work.tempdsn:
id date flag
1 01/01/2005 2
2 01/01/2005 1
2 02/02/2005 0
2 03/02/2005 1
Because the flag column equals 2 for the id 1 and date 01/01/2005 combination, arf=1 in the same date twice, meaning id #1 will receive an arf2 value of 0.
It gets a bit tricky with id #2, and all other id numbers with maximum flag amounts less than 2, because according to your stated requirements, in order for the arf2 column to be equal to 1 for a particular id, the original arf column must be equal to 1 for two different dates within that id. The HAVING statement in PROC SQL can accomodate this.
create table work.outdsn as
select distinct id
, case when max(flag)>=2 then 0
else 1 end as arf2
from work.tempdsn
group by id
having sum(flag)>=2;
drop table work.tempdsn;
quit;
If the maximum flag amount for an id is >=2, then arf2=0 for that id. If not, arf2=1. To ensure there are at least two different dates with arf=1, the having statement sums flag longitudinally across each id, keeping only the id numbers that have arf=1 for at least two dates within that id.
Since work.tempdsn is no longer needed, it is dropped from the WORK library, and the new data set, work.outdsn, is created containing the desired output below:
id arf2
1 0
2 1
The only part I do not understand from your post is how to treat id variables were arf=1 in only one date and equal to 0 in all other dates. Since arf2 is binary, should these id numbers be set equal to 1? 0? Dropped from the final data set?
In the code above, those id numbers are dropped from the final data set. For example, the following data set will yield the same results in work.outdsn because id #3 doesn't meet either requirement of the arf2 variable.
id date arf
1 01/01/2005 1
1 01/01/2005 1
2 01/01/2005 1
2 02/02/2005 0
2 03/02/2005 1
3 01/01/2005 1
3 02/02/2005 0
3 03/02/2005 0
data have; input id date : mmddyy10. arf ; format date mmddyy10.; cards; 1 01/01/2005 1 1 01/01/2005 1 2 01/01/2005 1 2 02/02/2005 0 2 03/02/2005 1 ; run; proc sql; select id,case when(count(distinct date) ne 1) then 1 else 0 end as arf2 from have where arf=1 group by id; quit;
Thank you all for the reply!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.