You can merge subsets of the same data set.
data want (keep=enum label='Enums observed by 02 and 05'); merge have (where=(ObserverID='02') in = _02) have (where=(ObserverID='05') in = _05) end=end ; by enum; _0205_flag = first.enum and _02 and _05; * use first. in case the field data has repeated information; count + _0205_flag; if end then put 'NOTE: ' count ' enums observed by both 02 and 05'; if _0205_flag; * subsetting if; run;
In SQL you can count a positive logical assertion over a group using SUM
proc sql; create table want as select enum from have group by enum having sum(ObserverId='02') > 0 and sum(ObserverId='05') > 0;
%put NOTE: &SQLOBS enums were dual observed by 02 and 05;
A NODUPKEY SORT + DATA step can find the enums that were dual observed
proc sort nodupkey data=have out=_0205 ; by enum ObserverId; where ObserverId in ('02' '05'); run; data want; set _0205; by enum; if last.enum and not first.enum; run;
Try this! replace the text in bold with your dataset name.
proc sql;
select count(distinct(enum)) into: count_event
from <datasetname>
where ObserverID in ('02','05')
;
quit;
%put count_event = &count_event;
Thanks! Yes, I realized my mistake after reading the other responses. I misinterpreted the initial question.
You can merge subsets of the same data set.
data want (keep=enum label='Enums observed by 02 and 05'); merge have (where=(ObserverID='02') in = _02) have (where=(ObserverID='05') in = _05) end=end ; by enum; _0205_flag = first.enum and _02 and _05; * use first. in case the field data has repeated information; count + _0205_flag; if end then put 'NOTE: ' count ' enums observed by both 02 and 05'; if _0205_flag; * subsetting if; run;
In SQL you can count a positive logical assertion over a group using SUM
proc sql; create table want as select enum from have group by enum having sum(ObserverId='02') > 0 and sum(ObserverId='05') > 0;
%put NOTE: &SQLOBS enums were dual observed by 02 and 05;
A NODUPKEY SORT + DATA step can find the enums that were dual observed
proc sort nodupkey data=have out=_0205 ; by enum ObserverId; where ObserverId in ('02' '05'); run; data want; set _0205; by enum; if last.enum and not first.enum; run;
Sir @RichardDeVen,
I suspect the new forum software has jumbled a few lines of the DATA step with your MERGE approach: The "if end then put ..." must have been moved upwards from its original place after the sum statement. Now it would be executed prematurely if the last BY group contributed to the count.
You have not explained in which form the result is required in. If you want the number in a macro-variable, the proc sql suggested by @vkumbhakarna is the way to go. If you want it in a dataset, you could use proc summary:
proc summary data=have;
where ObserverID in ('02', '05');
output out=counted(drop=_type_ rename=(_freq_=count));
run;
Hi,
assuming that ENum is integer and you are able to anticipate its range, following should do:
data have;
input ENum ObserverID $;
cards;
1 0G
1 05
1 02
2 04
2 05
2 02
3 0A
3 0P
3 02
4 0D
4 0J
4 02
5 0D
5 0M
5 0H
5 02
6 0H
6 0M
6 02
7 0G
7 05
7 02
8 0A
8 0P
8 02
9 0J
9 0G
9 02
;
run;
data want;
array obs2[9] _temporary_;
array obs5[9] _temporary_;
do until (EOF);
set have end = eof;
where observerID in ('02', '05');
select;
when(observerID = '02') obs2[ENum] = 1;
when(observerID = '05') obs5[ENum] = 1;
otherwise;
end;
end;
do ENum = 1 to 9;
count + (obs2[ENum] and obs5[ENum]);
end;
output;
keep count;
run;
if ENum isn't integer or you can't anticipate the range then following should do:
data want;
if 0 then set have;
declare hash obs2();
obs2.defineKey("ENum");
obs2.defineDone();
declare hiter I2("obs2");
declare hash obs5();
obs5.defineKey("ENum");
obs5.defineDone();
do until (EOF);
set have end = eof;
where observerID in ('02', '05');
select;
when(observerID = '02') _N_ = obs2.add();
when(observerID = '05') _N_ = obs5.add();
otherwise;
end;
end;
do while(I2.next() = 0);
if obs5.check() = 0 then count + 1;
end;
output;
keep count;
stop;
run;
or eventually:
proc sql;
create table want as
select count(*) from
(
select ENum from have where observerID = '02'
intersect
select ENum from have where observerID = '05'
)
;
quit;
All the best
Bart
Hi Bart @yabwon :
I would call your second hash way a 'hash merge'. This was my take on it.
data want(keep=enum label='They have been watching you'); retain enum .; declare hash _02(dataset:'have(where=(ObserverID = "02"))', ordered:'A'); declare hash _05(dataset:'have(where=(ObserverID = "05"))', ordered:'A'); _02.defineKey('enum'); _02.defineDone(); _05.defineKey('enum'); _05.defineDone(); declare hiter check ('_02'); do while (check.next() eq 0); if _05.check()=0 then do; _0205_pair_count + 1; output; end; end; put 'NOTE: There are ' _0205_pair_count 'enums that were observed by 02 and 05'; run;
Nice! I'm making a copy in my archive.
All the best
Bart
Richard,
I've just realised that for this task you need no more than 1 hash table (or 1 array)
proc format;
invalue primes
'02' = 2
'05' = 5
other = 1
;
run;
data _null_;
if 0 then set have;
declare hash obs(hashexp:20);
obs.defineKey("ENum");
obs.defineData("ENum","INT");
obs.defineDone();
do until (EOF);
set have end = eof;
where observerID in ('02', '05');
if obs.find() NE 0 then
INT = input(observerID, primes.);
else
INT = INT * input(observerID, primes.);
_N_ = obs.replace();
end;
/* if INT is divisible by 10 then ENum was observed by 02 nad 05*/
obs.output(dataset:'want(where=( mod(INT, 10) = 0 ))');
stop;
run;
%let range = 9;
data want;
array obs[&range.] _temporary_;
do ENum = 1 to &range.;
obs[ENum] = 1;
end;
do until (EOF);
set have end = eof;
where observerID in ('02', '05');
obs[ENum] = obs[ENum] * input(observerID, primes.);
end;
do ENum = 1 to &range.;
if mod(obs[ENum], 10) = 0 then output;
end;
keep ENum;
run;
All the best
Bart
These solutions worked too. The one using INTERSECT is really slick!
Thanks for your help!
Gene
Wow! i'm loving these forums. Get to learn something new every day.
Liked the use of SQLOBS. Per documentation, SQLOBS automatic macro variable is assigned a value after the SQL SELECT statement executes. PROC SQL produces six automatic macro variables (SQLOBS, SQLRC, SQLOOPS, SQLEXITCODE, SQLXRC, and SQLXMSG) to help you troubleshoot your program. For more information, see Using the PROC SQL Automatic Macro Variables.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.