Solved: Re: Suggest other ways of creating a frequency variable

jaredp · Posted 04-28-2014 12:54 PM

Hi everyone.

I'm hoping to benefit from your expertise to learn of new approaches to my solution below. The solution below works for me, although the proc tabulate feels a little slow at times on my larger data sets. Perhaps doing the entire solution in proc sql could be faster? But what would that look like? Or maybe a hash solution could be used here? Any thoughts or code is appreciated.

data have;

input name $;

cards;

john

jane

jack

jill

jenn

jeff

john

jill

jenn

jack

jane

jack

jane

jenn

;

run;

/* the frequency of the name is put into the totalcount variable */

data want; /*sorted or unsorted doesn't really matter*/

input name $ totalcount;

cards;

john 3

jane 3

jack 5

jill 2

jenn 4

jeff 1

john 3

jill 2

jenn 4

jack 5

jane 3

jack 5

jane 3

jenn 4

;

run;

proc tabulate data=have out=haveN order=freq;/*order=data will sort by name*/

class name;

tables name, N;

run;

proc sql;

/* 'Inner Join';*/

create table want as

select h.*, hn.N

from have as h, haveN as hn

where h.name=hn.name;

quit;

data_null__ · Posted 04-28-2014 02:16 PM

I would not choose TABULATE for simple summary such as this unless I wanted the printed output. You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway; 
   class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name)); 
   run; 
data want;
   set have;
   set haveN key=name/unique; 
   run;

View solution in original post

jwillis · Posted 04-28-2014 01:19 PM

Here's one way using a Proc Freq and a Proc SQL.

proc freq data=have order=data noprint;
tables name / missing nocum nopercent out=havecount;
run;

proc sql;
drop table work.haveboth;
CREATE TABLE work.haveboth as
SELECT a.name,
         (select distinct count from work.havecount as b
              where a.name = b.name) as count
FROM work.have as a
   ;
quit;

jaredp · Posted 04-28-2014 06:19 PM

Although I selected the other response as the correct answer, using proc freq actually seems to be slightly faster than proc summary. Both are faster than proc tabulate.

So I think I'll be using your proc freq suggestion. It's funny I didn't use proc freq in the first place, since it's the one I normally use. But back when I did this code, I was, admittedly, being lazy (or pressured for results...I can't remember).

Thanks!

Reeza · Posted 04-28-2014 01:35 PM

Is it a count of the names occurrence in the data base that you're looking for?

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

from have

group by name) as b

on a.name=b.name;

quit;

jwillis · Posted 04-28-2014 01:51 PM

Reeza,

Your query does not retain the original order of the data in "WANT". Please see my question at https://communities.sas.com/thread/57291.

jaredp · Posted 04-28-2014 06:24 PM

This one runs a little slower, but it's an attractive option since everything is in one package. It's very tempting to use because of that. For my immediate project, I'll stick with one fellows suggestion, but your proc sql code is going to help me in some adhoc stuff. I like it!

I like that you spotted my comment in the code about not requiring the data be ordered

Thank you!

data_null__ · Posted 04-28-2014 02:16 PM

I would not choose TABULATE for simple summary such as this unless I wanted the printed output. You can experiment with performance difference between the SQL and KEYed SET.

proc summary data=have nway; 
   class name;
   output out=haven(drop=_type_ rename=(_freq_=totalcount) index=(name)); 
   run; 
data want;
   set have;
   set haveN key=name/unique; 
   run;

jaredp · Posted 04-28-2014 06:15 PM

Wow, I had a longer response, but I timed out because I was busy testing everyone's code suggestions with my actual data.

That's a good point about not using Tabulate. In truth, I used it because at the time I only had to copy, paste and tweak some of my existing code from another SAS program. I admit I was being lazy, but isn't that one of the traits of being a programmer?

I'm looking now at putting this into a more production environment, so that's why optimizing is important. For me (or at least in the environment I work in), part of optimization takes into account code maintenance. Not only is your solution speedy with my data set, it's also easy on the eyes for good code maintenance.

Thank you!

Reeza · Posted 04-28-2014 03:54 PM

The question specifically stated:

jaredp wrote:

data want; /*sorted or unsorted doesn't really matter*/

As posted in the thread you referenced, if you need ordered, then you'll have to create an order variable first.

data have;

set have;

order=_n_;

run;

proc sql;

create table want as

select a.name, b.count

from have as a

left join (select name, count(*) as count

from have

group by name) as b

on a.name=b.name

order by a.order;

quit;

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away