Hi everyone.
I'm hoping to benefit from your expertise to learn of new approaches to my solution below. The solution below works for me, although the proc tabulate feels a little slow at times on my larger data sets. Perhaps doing the entire solution in proc sql could be faster? But what would that look like? Or maybe a hash solution could be used here? Any thoughts or code is appreciated.
data have;
input name $;
cards;
john
jane
jack
jill
jenn
jeff
john
john
jill
jenn
jack
jack
jane
jack
jack
jane
jenn
jenn
;
run;
/* the frequency of the name is put into the totalcount variable */
data want; /*sorted or unsorted doesn't really matter*/
input name $ totalcount;
cards;
john 3
jane 3
jack 5
jill 2
jenn 4
jeff 1
john 3
john 3
jill 2
jenn 4
jack 5
jack 5
jane 3
jack 5
jack 5
jane 3
jenn 4
jenn 4
;
run;
proc tabulate data=have out=haveN order=freq;/*order=data will sort by name*/
class name;
tables name, N;
run;
proc sql;
/* 'Inner Join';*/
create table want as
select h.*, hn.N
from have as h, haveN as hn
where h.name=hn.name;
quit;
I would not choose TABULATE for simple summary such as this unless I wanted the printed output. You can experiment with performance difference between the SQL and KEYed SET.
Here's one way using a Proc Freq and a Proc SQL.
proc freq data=have order=data noprint;
tables name / missing nocum nopercent out=havecount;
run;
proc sql;
drop table work.haveboth;
CREATE TABLE work.haveboth as
SELECT a.name,
(select distinct count from work.havecount as b
where a.name = b.name) as count
FROM work.have as a
;
quit;
Although I selected the other response as the correct answer, using proc freq actually seems to be slightly faster than proc summary. Both are faster than proc tabulate.
So I think I'll be using your proc freq suggestion. It's funny I didn't use proc freq in the first place, since it's the one I normally use. But back when I did this code, I was, admittedly, being lazy (or pressured for results...I can't remember).
Thanks!
Is it a count of the names occurrence in the data base that you're looking for?
proc sql;
create table want as
select a.name, b.count
from have as a
left join (select name, count(*) as count
from have
group by name) as b
on a.name=b.name;
quit;
Reeza,
Your query does not retain the original order of the data in "WANT". Please see my question at https://communities.sas.com/thread/57291.
This one runs a little slower, but it's an attractive option since everything is in one package. It's very tempting to use because of that. For my immediate project, I'll stick with one fellows suggestion, but your proc sql code is going to help me in some adhoc stuff. I like it!
I like that you spotted my comment in the code about not requiring the data be ordered
Thank you!
I would not choose TABULATE for simple summary such as this unless I wanted the printed output. You can experiment with performance difference between the SQL and KEYed SET.
Wow, I had a longer response, but I timed out because I was busy testing everyone's code suggestions with my actual data.
That's a good point about not using Tabulate. In truth, I used it because at the time I only had to copy, paste and tweak some of my existing code from another SAS program. I admit I was being lazy, but isn't that one of the traits of being a programmer?
I'm looking now at putting this into a more production environment, so that's why optimizing is important. For me (or at least in the environment I work in), part of optimization takes into account code maintenance. Not only is your solution speedy with my data set, it's also easy on the eyes for good code maintenance.
Thank you!
The question specifically stated:
jaredp wrote:
data want; /*sorted or unsorted doesn't really matter*/
As posted in the thread you referenced, if you need ordered, then you'll have to create an order variable first.
data have;
set have;
order=_n_;
run;
proc sql;
create table want as
select a.name, b.count
from have as a
left join (select name, count(*) as count
from have
group by name) as b
on a.name=b.name
order by a.order;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.