Solved: Counting unique records based on two variables

kmardinian · Posted 12-16-2019 02:29 PM

Hi, I have a dataset where I have subject ID and site ID as two separate variables and I would like to count the unique subject IDs for each site id.

subject site

10021 644

10256 644

56985 733

45698 733

12659 733

And I would like the output dataset to look like this:

site count

644 2

733 3

Any help is much appreciated! Thank you!

novinosrin · Posted 12-16-2019 02:33 PM

proc sql;
create table want as
select site, count(unique subject) as count
from have
group by site;
quit;

View solution in original post

novinosrin · Posted 12-16-2019 02:33 PM

proc sql;
create table want as
select site, count(unique subject) as count
from have
group by site;
quit;

kmardinian · Posted 12-16-2019 03:39 PM

Thank you!! This is exactly what I needed!

ed_sas_member · Posted 12-16-2019 02:34 PM

Hi @kmardinian

You can do this:

data have;
	input subject site;
	cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
run;
proc sql;
	create table want as
	select site, count(distinct subject) as count
	from have
	group by site;
run;

Reeza · Posted 12-16-2019 02:39 PM

A double proc freq is the other option.

proc freq data=have noprint;
table site*subject / out=temp;
run;

proc freq data=temp;
table site / out=want;
run;

Ksharp · Posted 12-17-2019 07:03 AM

Reeza ,

no need double, just one shot.

data have;
	input subject site;
	cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;

ods output nlevels=want;
proc freq data=have nlevels;
by site;
table subject;
run;

novinosrin · Posted 12-16-2019 02:41 PM

if the data is grouped like your sample suggests, a datastep is convenient too

data have;
input subject $            site $;
cards;
10021             644
10021             644
10256             644
10256             644
56985             733
56985             733
45698             733
12659             733
;

data want;
set have;
by site subject notsorted;
if first.site then count=1;
else if first.subject then count+1;
if last.site;
drop subject;
run;

novinosrin · Posted 12-16-2019 02:46 PM

Some HASH teaser 🙂

data have;
input subject $            site $;
cards;
10021             644
10256             644
10021             644
10256             644
56985             733
56985             733
45698             733
12659             733
;


data want;
if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("subject") ;
   h.definedata ("subject") ;
   h.definedone () ;
 end;
 do until(last.site);
  set have;
  by site;
  h.ref();
 end;
 count=h.num_items;
 h.clear();
 drop subject;
run;

novinosrin · Posted 12-16-2019 02:52 PM

Key indexing paint brush . This assumes subjectid is numeric or char with only digit characters

data have;
input subject $            site $;
cards;
10021             644
10256             644
10021             644
10256             644
56985             733
56985             733
45698             733
12659             733
;

data want;
array t(-1000000:1000000)_temporary_;
do until(last.site);
  set have;
  by site;
  t(input(subject,32.))=1;
end;
count=n(of t(*));
call missing(of t(*));
run;

Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Re: Counting unique records based on two variables

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away