- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I have a dataset where I have subject ID and site ID as two separate variables and I would like to count the unique subject IDs for each site id.
subject site
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
And I would like the output dataset to look like this:
site count
644 2
733 3
Any help is much appreciated! Thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql;
create table want as
select site, count(unique subject) as count
from have
group by site;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc sql;
create table want as
select site, count(unique subject) as count
from have
group by site;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!! This is exactly what I needed!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @kmardinian
You can do this:
data have;
input subject site;
cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
run;
proc sql;
create table want as
select site, count(distinct subject) as count
from have
group by site;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc freq data=have noprint;
table site*subject / out=temp;
run;
proc freq data=temp;
table site / out=want;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Reeza ,
no need double, just one shot.
data have;
input subject site;
cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
ods output nlevels=want;
proc freq data=have nlevels;
by site;
table subject;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if the data is grouped like your sample suggests, a datastep is convenient too
data have;
input subject $ site $;
cards;
10021 644
10021 644
10256 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
data want;
set have;
by site subject notsorted;
if first.site then count=1;
else if first.subject then count+1;
if last.site;
drop subject;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Some HASH teaser 🙂
data have;
input subject $ site $;
cards;
10021 644
10256 644
10021 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("subject") ;
h.definedata ("subject") ;
h.definedone () ;
end;
do until(last.site);
set have;
by site;
h.ref();
end;
count=h.num_items;
h.clear();
drop subject;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Key indexing paint brush . This assumes subjectid is numeric or char with only digit characters
data have;
input subject $ site $;
cards;
10021 644
10256 644
10021 644
10256 644
56985 733
56985 733
45698 733
12659 733
;
data want;
array t(-1000000:1000000)_temporary_;
do until(last.site);
set have;
by site;
t(input(subject,32.))=1;
end;
count=n(of t(*));
call missing(of t(*));
run;