Solved: Count specific observations within groups of observations

genemroz · Posted 04-09-2020 12:13 AM

This is probably simple but....I don't know how to do it.

My dataset consists of Events recorded by multiple Observers. I want a count of Events observed by both ObserverID 02 and 05. See below... Thanks in advance!

ENum ObserverID
1 0G
1         05
1         02
2 04
2         05
2         02
3         0A
3 0P
3         02
4         0D
4         0J
4         02
5         0D
5         0M
5         0H
5         02
6         0H
6         0M
6         02
7    0G
7    05
7         02
8         0A
8         0P
8         02
9         0J
9         0G
9         02

RichardDeVen · Posted 04-09-2020 01:18 AM

You can merge subsets of the same data set.

data want (keep=enum label='Enums observed by 02 and 05');
  merge 
    have (where=(ObserverID='02') in = _02)
    have (where=(ObserverID='05') in = _05)
    end=end
  ;
  by enum;

  _0205_flag = first.enum and _02 and _05;  * use first. in case the field data has repeated information;

  count + _0205_flag;

  if end then put 'NOTE: ' count ' enums observed by both 02 and 05';

  if _0205_flag;  * subsetting if;
run;

In SQL you can count a positive logical assertion over a group using SUM

proc sql;
  create table want as
  select enum from have
  group by enum
  having sum(ObserverId='02') > 0 and sum(ObserverId='05') > 0;

%put NOTE: &SQLOBS enums were dual observed by 02 and 05;

A NODUPKEY SORT + DATA step can find the enums that were dual observed

proc sort nodupkey data=have out=_0205 ;
  by enum ObserverId;
  where ObserverId in ('02' '05');
run;

data want;
  set _0205;
  by enum;
  if last.enum and not first.enum;
run;

View solution in original post

vkumbhakarna · Posted 04-09-2020 01:16 AM

Try this! replace the text in bold with your dataset name.
proc sql;
select count(distinct(enum)) into: count_event
from <datasetname>
where ObserverID in ('02','05')
;
quit;
%put count_event = &count_event;

yabwon · Posted 04-10-2020 03:56 AM

Hi,
using the example data your code returns 9 - but expected is 3. Requirement was "count of Events observed by both ObserverID 02 and 05" what is not equivalent to condition `where ObserverID in ('02','05')` (i.e. ObserverID = '02' OR ObserverID = '05').

All the best
Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

vkumbhakarna · Posted 04-10-2020 05:22 PM

Thanks! Yes, I realized my mistake after reading the other responses. I misinterpreted the initial question.

RichardDeVen · Posted 04-09-2020 01:18 AM

You can merge subsets of the same data set.

data want (keep=enum label='Enums observed by 02 and 05');
  merge 
    have (where=(ObserverID='02') in = _02)
    have (where=(ObserverID='05') in = _05)
    end=end
  ;
  by enum;

  _0205_flag = first.enum and _02 and _05;  * use first. in case the field data has repeated information;

  count + _0205_flag;

  if end then put 'NOTE: ' count ' enums observed by both 02 and 05';

  if _0205_flag;  * subsetting if;
run;

In SQL you can count a positive logical assertion over a group using SUM

proc sql;
  create table want as
  select enum from have
  group by enum
  having sum(ObserverId='02') > 0 and sum(ObserverId='05') > 0;

%put NOTE: &SQLOBS enums were dual observed by 02 and 05;

A NODUPKEY SORT + DATA step can find the enums that were dual observed

proc sort nodupkey data=have out=_0205 ;
  by enum ObserverId;
  where ObserverId in ('02' '05');
run;

data want;
  set _0205;
  by enum;
  if last.enum and not first.enum;
run;

FreelanceReinh · Posted 04-09-2020 05:48 PM

Sir @RichardDeVen,

I suspect the new forum software has jumbled a few lines of the DATA step with your MERGE approach: The "if end then put ..." must have been moved upwards from its original place after the sum statement. Now it would be executed prematurely if the last BY group contributed to the count.

RichardDeVen · Posted 04-09-2020 06:31 PM

Good catch. I'll fix the answer. In order to have the END= based logging PUT in the step, there needs to be some more scaffolding before the subsetting IF.

andreas_lds · Posted 04-09-2020 02:28 AM

You have not explained in which form the result is required in. If you want the number in a macro-variable, the proc sql suggested by @vkumbhakarna is the way to go. If you want it in a dataset, you could use proc summary:

proc summary data=have;
	where ObserverID in ('02', '05');
	output out=counted(drop=_type_ rename=(_freq_=count));
run;

yabwon · Posted 04-09-2020 06:10 AM

Hi,

assuming that ENum is integer and you are able to anticipate its range, following should do:

data have;
input ENum ObserverID $;
cards;

1         0G
1          05
1          02
2         04
2          05
2          02
3          0A
3           0P
3          02
4          0D
4          0J
4          02
5          0D
5          0M
5          0H
5          02
6          0H
6          0M
6          02
7          0G
7          05
7          02
8          0A
8          0P
8          02
9          0J
9          0G
9          02
;
run;

data want;
  array obs2[9] _temporary_;
  array obs5[9] _temporary_;

  do until (EOF);
    set have end = eof;
      where observerID in ('02', '05');
    select;
      when(observerID = '02') obs2[ENum] = 1;
      when(observerID = '05') obs5[ENum] = 1;
      otherwise;
    end;
  end;

  do ENum = 1 to 9;
    count + (obs2[ENum] and obs5[ENum]);
  end;
  output;
  keep count;
run;

if ENum isn't integer or you can't anticipate the range then following should do:

data want;
  if 0 then set have;
  declare hash obs2();
  obs2.defineKey("ENum");
  obs2.defineDone();
  declare hiter I2("obs2");

  declare hash obs5();
  obs5.defineKey("ENum");
  obs5.defineDone();

  do until (EOF);
    set have end = eof;
      where observerID in ('02', '05');
    select;
      when(observerID = '02') _N_ = obs2.add();
      when(observerID = '05') _N_ = obs5.add();
      otherwise;
    end;
  end;

  do while(I2.next() = 0);
    if obs5.check() = 0 then count + 1;
  end;
  output;
  keep count;
  stop;
run;

or eventually:

proc sql;
  create table want as  
  select count(*) from  
    (
    select ENum from have where observerID = '02'
    intersect
    select ENum from have where observerID = '05' 
    )
  ;
quit;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

RichardDeVen · Posted 04-09-2020 11:00 AM

Hi Bart @yabwon :

I would call your second hash way a 'hash merge'. This was my take on it.

data want(keep=enum label='They have been watching you');
  retain enum .;

  declare hash _02(dataset:'have(where=(ObserverID = "02"))', ordered:'A');
  declare hash _05(dataset:'have(where=(ObserverID = "05"))', ordered:'A');

  _02.defineKey('enum'); _02.defineDone();
  _05.defineKey('enum'); _05.defineDone();

  declare hiter check ('_02');
  do while (check.next() eq 0);
    if _05.check()=0 then do;
      _0205_pair_count + 1;
      output;
    end;
  end;

  put 'NOTE: There are ' _0205_pair_count 'enums that were observed by 02 and 05';
run;

yabwon · Posted 04-09-2020 11:08 AM

Nice! I'm making a copy in my archive.

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 04-11-2020 01:52 AM

@RichardDeVen

Richard,

I've just realised that for this task you need no more than 1 hash table (or 1 array)

proc format;
  invalue primes
  '02'  = 2
  '05'  = 5
  other = 1
  ;
run;

data _null_;
  if 0 then set have;
  declare hash obs(hashexp:20);
  obs.defineKey("ENum");
  obs.defineData("ENum","INT");
  obs.defineDone();

  do until (EOF);
    set have end = eof;
    where observerID in ('02', '05');

    if obs.find() NE 0 then
      INT = input(observerID, primes.);
    else
      INT = INT * input(observerID, primes.);
    _N_ = obs.replace();
  end;

  /* if INT is divisible by 10 then ENum was observed by 02 nad 05*/
  obs.output(dataset:'want(where=( mod(INT, 10) = 0 ))');
  stop;
run;


%let range = 9;
data want;
  array obs[&range.] _temporary_;
    do ENum = 1 to &range.;
      obs[ENum] = 1;
    end;

  do until (EOF);
    set have end = eof;
    where observerID in ('02', '05');

    obs[ENum] = obs[ENum] * input(observerID, primes.);

  end;

  do ENum = 1 to &range.;
    if mod(obs[ENum], 10) = 0 then output;
  end;
  keep ENum;
run;

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

RichardDeVen · Posted 04-09-2020 11:01 AM

Great use of INTERSECT !

genemroz · Posted 04-09-2020 12:03 PM

These solutions worked too. The one using INTERSECT is really slick!

Thanks for your help!

Gene

vkumbhakarna · Posted 04-09-2020 01:53 PM

Wow! i'm loving these forums. Get to learn something new every day.

Liked the use of SQLOBS. Per documentation, SQLOBS automatic macro variable is assigned a value after the SQL SELECT statement executes. PROC SQL produces six automatic macro variables (SQLOBS, SQLRC, SQLOOPS, SQLEXITCODE, SQLXRC, and SQLXMSG) to help you troubleshoot your program. For more information, see Using the PROC SQL Automatic Macro Variables.

Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

Re: Count specific observations within groups of observations

SAS Innovate 2025: Call for Content

Classroom Training Available!