Hi guys,
suppose to have the following dataset:
data DB;
input ID :$20. Flag :$20. Observation;
cards;
0001 Y .
0001 N 1
0001 N .
0001 N 1
0002 N 1
0002 N 1
0002 N 1
0002 N .
...;
Is there a way to sum all "Observation(s)" =1 but only for IDs having at least one Flag = "Y"?
Desired output: sum = 1
Thank you in advance
Can you please show the output you want?
If you want the same dataset, with a sum added, you could use a double-DOW loop, e.g.
data want ;
do until(last.id) ;
set db ;
by id ;
if flag='Y' then _flagged=1 ;
_sum=sum(observation,_sum,0) ;
end ;
if _flagged then mysum=_sum ;
do until(last.id) ;
set db ;
by id ;
output ;
end ;
run ;
This is something that SQL syntax would make it easy to express.
You describe it as a SUM but to me it looks more like a COUNT.
create table want as
select count(distinct id) as n_ids_flagged
from have
where observation=1
and id in (select id from have where flag='Y')
;
From your example data, shouldn't the answer be 2 rather than 1?
You could still use a DOW-loop approach to flag the IDs of interest, then do the summing of the flagged records:
data want (keep=mysum);
do until(last.id) ;
set db ;
by id ;
if flag='Y' then _flagged=1 ;
end ;
do until(last.id) ;
set db end=last ;
by id ;
if _flagged then mysum++observation;
end ;
if last then output ;
run ;
Or via SQL:
proc sql ;
create table want as
select sum(observation) as mysum
from
(select id, observation
from db
group by id
having max(flag)='Y'
)
;
quit ;
I think it can be done with one data pass:
data DB;
input ID :$20. Flag :$20. Observation;
cards;
0001 Y .
0001 N 1
0001 N .
0001 N 1
0002 N 1
0002 N 1
0002 N 1
0002 N .
0003 N .
0003 N 1
0003 Y .
0003 N 1
;
run;
data want;
set db end=_E_;
by id;
if first.id then do; s=0; check=0; end;
s + (Observation=1);
check + (Flag="Y");
if last.id and check then sum+s;
if _E_;
run;
proc print data=want;
run;
B.
I think even almost "unconditionall" 😄
data want2;
set db end=_E_;
by id;
s=s*(^first.id);
check=check*(^first.id);
s + (Observation=1);
check + (Flag="Y");
sum+s*(last.id*check);
if _E_;
run;
proc print data=want2;
run;
B
I like it, Bart 😁
Since OP hints that there could be more than one record for an ID with flag='Y', perhaps:
check + (Flag="Y");
Should be:
check + ((Flag="Y") and (Not check));
?
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.