DATA Step, Macro, Functions and more

Suggest other ways of creating a frequency variable

Accepted Solution Solved
Reply
Contributor
Posts: 71
Accepted Solution

Suggest other ways of creating a frequency variable

Hi everyone.

I'm hoping to benefit from your expertise to learn of new approaches to my solution below.  The solution below works for me, although the proc tabulate feels a little slow at times on my larger data sets.  Perhaps doing the entire solution in proc sql could be faster?  But what would that look like?  Or maybe a hash solution could be used here?  Any thoughts or code is appreciated. 

data have;

input name $;

cards;

john

jane

jack

jill

jenn

jeff

john

john

jill

jenn

jack

jack

jane

jack

jack

jane

jenn

jenn

;

run;

/* the frequency of the name is put into the totalcount variable */

data want; /*sorted or unsorted doesn't really matter*/

input name $ totalcount;

cards;

john 3

jane 3

jack 5

jill 2

jenn 4

jeff 1

john 3

john 3

jill 2

jenn 4

jack 5

jack 5

jane 3

jack 5

jack 5

jane 3

jenn 4

jenn 4

;

run;

proc tabulate data=have out=haveN order=freq;/*order=data will sort by name*/

class name;

tables name, N;

run;

proc sql;

/*   'Inner Join';*/

create table want as

    select h.*, hn.N

        from have as h, haveN as hn

          where h.name=hn.name;

quit;


Accepted Solutions
Solution
‎04-28-2014 02:16 PM
Respected Advisor
Posts: 3,799

Re: Suggest other ways of creating a frequency variable

I would not choose TABULATE for simple summary such as this unless I wanted the printed output.  You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway;
  
class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name));
  
run;
data want;
   set have;
   set haveN key=name/unique;
  
run;

View solution in original post


All Replies
Regular Contributor
Posts: 217

Re: Suggest other ways of creating a frequency variable

Here's one way using a Proc Freq and a Proc SQL.


proc freq data=have order=data noprint;
        tables name / missing nocum nopercent out=havecount;
run;      

proc sql;
  drop table work.haveboth;
  CREATE TABLE work.haveboth as
  SELECT a.name,
         (select distinct count from work.havecount as b
              where a.name = b.name) as count
  FROM work.have as a
   ;
quit;

Contributor
Posts: 71

Re: Suggest other ways of creating a frequency variable

Although I selected the other response as the correct answer, using proc freq actually seems to be slightly faster than proc summary.  Both are faster than proc tabulate.

So I think I'll be using your proc freq suggestion.  It's funny I didn't use proc freq in the first place, since it's the one I normally use.  But back when I did this code, I was, admittedly, being lazy (or pressured for results...I can't remember).

Thanks!

Super User
Posts: 19,876

Re: Suggest other ways of creating a frequency variable

Is it a count of the names occurrence in the data base that you're looking for?

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

            from have

            group by name) as b

on a.name=b.name;

quit;

Regular Contributor
Posts: 217

Re: Suggest other ways of creating a frequency variable

Reeza,

Your query does not retain the original order of the data in "WANT".  Please see my question at https://communities.sas.com/thread/57291.


Contributor
Posts: 71

Re: Suggest other ways of creating a frequency variable

This one runs a little slower, but it's an attractive option since everything is in one package.  It's very tempting to use because of that.  For my immediate project, I'll stick with one fellows suggestion, but your proc sql code is going to help me in some adhoc stuff.  I like it! 

I like that you spotted my comment in the code about not requiring the data be ordered Smiley Wink 

Thank you!

Solution
‎04-28-2014 02:16 PM
Respected Advisor
Posts: 3,799

Re: Suggest other ways of creating a frequency variable

I would not choose TABULATE for simple summary such as this unless I wanted the printed output.  You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway;
  
class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name));
  
run;
data want;
   set have;
   set haveN key=name/unique;
  
run;
Contributor
Posts: 71

Re: Suggest other ways of creating a frequency variable

Posted in reply to data_null__

Wow, I had a longer response, but I timed out because I was busy testing everyone's code suggestions with my actual data. 

That's a good point about not using Tabulate.  In truth, I used it because at the time I only had to copy, paste and tweak some of my existing code from another SAS program.  I admit I was being lazy, but isn't that one of the traits of being a programmer?

I'm looking now at putting this into a more production environment, so that's why optimizing is important.  For me (or at least in the environment I work in), part of optimization takes into account code maintenance.  Not only is your solution speedy with my data set, it's also easy on the eyes for good code maintenance.

Thank you!

Super User
Posts: 19,876

Re: Suggest other ways of creating a frequency variable

The question specifically stated:

jaredp wrote:

data want; /*sorted or unsorted doesn't really matter*/

As posted in the thread you referenced, if you need ordered, then you'll have to create an order variable first.

data have;

    set have;

    order=_n_;

run;

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

            from have

            group by name) as b

on a.name=b.name

    order by a.order;

quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 414 views
  • 9 likes
  • 4 in conversation