BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jaredp
Quartz | Level 8

Hi everyone.

I'm hoping to benefit from your expertise to learn of new approaches to my solution below.  The solution below works for me, although the proc tabulate feels a little slow at times on my larger data sets.  Perhaps doing the entire solution in proc sql could be faster?  But what would that look like?  Or maybe a hash solution could be used here?  Any thoughts or code is appreciated. 

data have;

input name $;

cards;

john

jane

jack

jill

jenn

jeff

john

john

jill

jenn

jack

jack

jane

jack

jack

jane

jenn

jenn

;

run;

/* the frequency of the name is put into the totalcount variable */

data want; /*sorted or unsorted doesn't really matter*/

input name $ totalcount;

cards;

john 3

jane 3

jack 5

jill 2

jenn 4

jeff 1

john 3

john 3

jill 2

jenn 4

jack 5

jack 5

jane 3

jack 5

jack 5

jane 3

jenn 4

jenn 4

;

run;

proc tabulate data=have out=haveN order=freq;/*order=data will sort by name*/

class name;

tables name, N;

run;

proc sql;

/*   'Inner Join';*/

create table want as

    select h.*, hn.N

        from have as h, haveN as hn

          where h.name=hn.name;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

I would not choose TABULATE for simple summary such as this unless I wanted the printed output.  You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway;
  
class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name));
  
run;
data want;
   set have;
   set haveN key=name/unique;
  
run;

View solution in original post

8 REPLIES 8
jwillis
Quartz | Level 8

Here's one way using a Proc Freq and a Proc SQL.


proc freq data=have order=data noprint;
        tables name / missing nocum nopercent out=havecount;
run;      

proc sql;
  drop table work.haveboth;
  CREATE TABLE work.haveboth as
  SELECT a.name,
         (select distinct count from work.havecount as b
              where a.name = b.name) as count
  FROM work.have as a
   ;
quit;

jaredp
Quartz | Level 8

Although I selected the other response as the correct answer, using proc freq actually seems to be slightly faster than proc summary.  Both are faster than proc tabulate.

So I think I'll be using your proc freq suggestion.  It's funny I didn't use proc freq in the first place, since it's the one I normally use.  But back when I did this code, I was, admittedly, being lazy (or pressured for results...I can't remember).

Thanks!

Reeza
Super User

Is it a count of the names occurrence in the data base that you're looking for?

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

            from have

            group by name) as b

on a.name=b.name;

quit;

jwillis
Quartz | Level 8

Reeza,

Your query does not retain the original order of the data in "WANT".  Please see my question at https://communities.sas.com/thread/57291.


jaredp
Quartz | Level 8

This one runs a little slower, but it's an attractive option since everything is in one package.  It's very tempting to use because of that.  For my immediate project, I'll stick with one fellows suggestion, but your proc sql code is going to help me in some adhoc stuff.  I like it! 

I like that you spotted my comment in the code about not requiring the data be ordered Smiley Wink 

Thank you!

data_null__
Jade | Level 19

I would not choose TABULATE for simple summary such as this unless I wanted the printed output.  You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway;
  
class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name));
  
run;
data want;
   set have;
   set haveN key=name/unique;
  
run;
jaredp
Quartz | Level 8

Wow, I had a longer response, but I timed out because I was busy testing everyone's code suggestions with my actual data. 

That's a good point about not using Tabulate.  In truth, I used it because at the time I only had to copy, paste and tweak some of my existing code from another SAS program.  I admit I was being lazy, but isn't that one of the traits of being a programmer?

I'm looking now at putting this into a more production environment, so that's why optimizing is important.  For me (or at least in the environment I work in), part of optimization takes into account code maintenance.  Not only is your solution speedy with my data set, it's also easy on the eyes for good code maintenance.

Thank you!

Reeza
Super User

The question specifically stated:

jaredp wrote:

data want; /*sorted or unsorted doesn't really matter*/

As posted in the thread you referenced, if you need ordered, then you'll have to create an order variable first.

data have;

    set have;

    order=_n_;

run;

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

            from have

            group by name) as b

on a.name=b.name

    order by a.order;

quit;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1779 views
  • 9 likes
  • 4 in conversation