Hi, I have a dataset where I have subject ID and site ID as two separate variables and I would like to count the unique subject IDs for each site id.
subject site
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
And I would like the output dataset to look like this:
site count
644 2
733 3
Any help is much appreciated! Thank you!
proc sql;
create table want as
select site, count(unique subject) as count
from have
group by site;
quit;
proc sql;
create table want as
select site, count(unique subject) as count
from have
group by site;
quit;
Thank you!! This is exactly what I needed!
Hi @kmardinian
You can do this:
data have;
input subject site;
cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
run;
proc sql;
create table want as
select site, count(distinct subject) as count
from have
group by site;
run;
Reeza ,
no need double, just one shot.
data have;
input subject site;
cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
ods output nlevels=want;
proc freq data=have nlevels;
by site;
table subject;
run;
if the data is grouped like your sample suggests, a datastep is convenient too
data have;
input subject $ site $;
cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
data want;
set have;
by site subject notsorted;
if first.site then count=1;
else if first.subject then count+1;
if last.site;
drop subject;
run;
Some HASH teaser 🙂
data have;
input subject $ site $;
cards;
10021 644
10256 644
10021 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("subject") ;
h.definedata ("subject") ;
h.definedone () ;
end;
do until(last.site);
set have;
by site;
h.ref();
end;
count=h.num_items;
h.clear();
drop subject;
run;
Key indexing paint brush . This assumes subjectid is numeric or char with only digit characters
data have;
input subject $ site $;
cards;
10021 644
10256 644
10021 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
data want;
array t(-1000000:1000000)_temporary_;
do until(last.site);
set have;
by site;
t(input(subject,32.))=1;
end;
count=n(of t(*));
call missing(of t(*));
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.