BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jaredp
Quartz | Level 8

Hi everyone.

I'm hoping to benefit from your expertise to learn of new approaches to my solution below.  The solution below works for me, although the proc tabulate feels a little slow at times on my larger data sets.  Perhaps doing the entire solution in proc sql could be faster?  But what would that look like?  Or maybe a hash solution could be used here?  Any thoughts or code is appreciated. 

data have;

input name $;

cards;

john

jane

jack

jill

jenn

jeff

john

john

jill

jenn

jack

jack

jane

jack

jack

jane

jenn

jenn

;

run;

/* the frequency of the name is put into the totalcount variable */

data want; /*sorted or unsorted doesn't really matter*/

input name $ totalcount;

cards;

john 3

jane 3

jack 5

jill 2

jenn 4

jeff 1

john 3

john 3

jill 2

jenn 4

jack 5

jack 5

jane 3

jack 5

jack 5

jane 3

jenn 4

jenn 4

;

run;

proc tabulate data=have out=haveN order=freq;/*order=data will sort by name*/

class name;

tables name, N;

run;

proc sql;

/*   'Inner Join';*/

create table want as

    select h.*, hn.N

        from have as h, haveN as hn

          where h.name=hn.name;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

I would not choose TABULATE for simple summary such as this unless I wanted the printed output.  You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway;
  
class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name));
  
run;
data want;
   set have;
   set haveN key=name/unique;
  
run;

View solution in original post

8 REPLIES 8
jwillis
Quartz | Level 8

Here's one way using a Proc Freq and a Proc SQL.


proc freq data=have order=data noprint;
        tables name / missing nocum nopercent out=havecount;
run;      

proc sql;
  drop table work.haveboth;
  CREATE TABLE work.haveboth as
  SELECT a.name,
         (select distinct count from work.havecount as b
              where a.name = b.name) as count
  FROM work.have as a
   ;
quit;

jaredp
Quartz | Level 8

Although I selected the other response as the correct answer, using proc freq actually seems to be slightly faster than proc summary.  Both are faster than proc tabulate.

So I think I'll be using your proc freq suggestion.  It's funny I didn't use proc freq in the first place, since it's the one I normally use.  But back when I did this code, I was, admittedly, being lazy (or pressured for results...I can't remember).

Thanks!

Reeza
Super User

Is it a count of the names occurrence in the data base that you're looking for?

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

            from have

            group by name) as b

on a.name=b.name;

quit;

jwillis
Quartz | Level 8

Reeza,

Your query does not retain the original order of the data in "WANT".  Please see my question at https://communities.sas.com/thread/57291.


jaredp
Quartz | Level 8

This one runs a little slower, but it's an attractive option since everything is in one package.  It's very tempting to use because of that.  For my immediate project, I'll stick with one fellows suggestion, but your proc sql code is going to help me in some adhoc stuff.  I like it! 

I like that you spotted my comment in the code about not requiring the data be ordered Smiley Wink 

Thank you!

data_null__
Jade | Level 19

I would not choose TABULATE for simple summary such as this unless I wanted the printed output.  You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway;
  
class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name));
  
run;
data want;
   set have;
   set haveN key=name/unique;
  
run;
jaredp
Quartz | Level 8

Wow, I had a longer response, but I timed out because I was busy testing everyone's code suggestions with my actual data. 

That's a good point about not using Tabulate.  In truth, I used it because at the time I only had to copy, paste and tweak some of my existing code from another SAS program.  I admit I was being lazy, but isn't that one of the traits of being a programmer?

I'm looking now at putting this into a more production environment, so that's why optimizing is important.  For me (or at least in the environment I work in), part of optimization takes into account code maintenance.  Not only is your solution speedy with my data set, it's also easy on the eyes for good code maintenance.

Thank you!

Reeza
Super User

The question specifically stated:

jaredp wrote:

data want; /*sorted or unsorted doesn't really matter*/

As posted in the thread you referenced, if you need ordered, then you'll have to create an order variable first.

data have;

    set have;

    order=_n_;

run;

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

            from have

            group by name) as b

on a.name=b.name

    order by a.order;

quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1308 views
  • 9 likes
  • 4 in conversation