DATA Step, Macro, Functions and more

Count of times repeated alternative to proc freq?

Accepted Solution Solved
Reply
Super Contributor
Posts: 374
Accepted Solution

Count of times repeated alternative to proc freq?

[ Edited ]

Is there a way to create frequency of repetition of duplicates  by certain variable? I know proc freq and merge it back to an original data. But I'm wondering is there any more efficient way doing the same thing? I couldn't figure it out yet using first.obs and last.obs features.

 

data b.file;
input pat_id drug $ count;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;

 


Accepted Solutions
Solution
3 weeks ago
PROC Star
Posts: 1,836

Re: Count of times repeated alternative to proc freq?

Do you mean this?

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;

proc sql;
create table want as
select pat_id,drug, count(pat_id) as count
from file
group by pat_id;
quit;

View solution in original post


All Replies
Super User
Posts: 13,583

Re: Count of times repeated alternative to proc freq?

What do you expect to show as a result from that data? It is not at all clear what you want.

Solution
3 weeks ago
PROC Star
Posts: 1,836

Re: Count of times repeated alternative to proc freq?

Do you mean this?

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;

proc sql;
create table want as
select pat_id,drug, count(pat_id) as count
from file
group by pat_id;
quit;
Super Contributor
Posts: 374

Re: Count of times repeated alternative to proc freq?

Posted in reply to novinosrin

@novinosrin

Awesome. And I can keep all the variables since I want to keep the structure of original data, correct. Thanks a lot.

 

proc sql;

create table b.want as

select *, count(pat_id) as count

from b.file

group by pat_id;

quit;

Super User
Posts: 10,280

Re: Count of times repeated alternative to proc freq?

With all methods, you have to first create the sums and then merge back. SQL does it transparently, but the overall work by SAS is mostly the same, as is the time that it will take.

If the file is sorted, you can use a double DOW:

data have;
input pat_id drug $ count;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;
run;

data want;
newcount = 0;
do until (last.pat_id);
  set have;
  by pat_id;
  newcount + 1;
end;
do until (last.pat_id);
  set have;
  by pat_id;
  output;
end;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super Contributor
Posts: 374

Re: Count of times repeated alternative to proc freq?

Posted in reply to KurtBremser

I just run into this problem not being able to Like the posts. Do you know why?

why.png

Trusted Advisor
Posts: 1,256

Re: Count of times repeated alternative to proc freq?


@Cruise wrote:

I just run into this problem not being able to Like the posts. Do you know why?

 


This seems to be a general issue. At the moment, I can't give likes either. Not sure if the moderators (@ChrisHemedinger and colleagues) are aware of it.

Community Manager
Posts: 3,441

Re: Count of times repeated alternative to proc freq?

Posted in reply to FreelanceReinhard

Sorry about that -- we had an interloper that was coming in and spamming us with Likes.  I know, seems innocuous, but not a cool behavior.  We clamped down a bit, too much it seems.  But you should be able to Like all you want now.

Trusted Advisor
Posts: 1,256

Re: Count of times repeated alternative to proc freq?

The DOW loop technique suggested by Kurt Bremser has an additional advantage over PROC SQL (without a suitable ORDER BY clause): It ensures that observation order is maintained.

 

Super Contributor
Posts: 374

Re: Count of times repeated alternative to proc freq?

Posted in reply to FreelanceReinhard
Can you give an example of "Not Suitable ORDER BY clause" situation?
Trusted Advisor
Posts: 1,256

Re: Count of times repeated alternative to proc freq?


@Cruise wrote:
Can you give an example of "Not Suitable ORDER BY clause" situation?

Well, when I run @novinosrin's PROC SQL step, I get this result:

pat_id    drug    count

   3      009       2
   3      648       2
   4      504       2
   4      569       2
   5      003       1

As you see, the order of the first two observations has changed (as compared to the input dataset). Since dataset FILE is not sorted by DRUG within PAT_ID BY groups (neither ascending nor descending), there is no obvious ORDER BY clause which would be suitable to maintain the original sort order (if this was an issue).

Super Contributor
Posts: 374

Re: Count of times repeated alternative to proc freq?

Posted in reply to FreelanceReinhard
I see. Thanks.
PROC Star
Posts: 1,836

Re: Count of times repeated alternative to proc freq?

Try this for fun and test the performance against DOW

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;


data want;
 dcl hash h(suminc: "c", dataset:"file");
 h.defineKey('pat_id');
 h.defineDone();
 c = 1;
 do while (not last);
   set file  end=last;
   rc = h.find();
 end;
 last = 0;
 do while (not last);
   set file end=last;
   rc = h.sum(sum: count);
   output;
 end;
 stop;
 keep pat_id drug  count;
run;
Super Contributor
Posts: 374

Re: Count of times repeated alternative to proc freq?

Posted in reply to novinosrin

Great solutions going on here while I lost LIKE button. I really want my LIKE button back. Please let me know if someone knows why LIKE is lost?

Trusted Advisor
Posts: 1,256

Re: Count of times repeated alternative to proc freq?

[ Edited ]
Posted in reply to novinosrin

@novinosrin wrote:

Try this for fun and test the performance against DOW

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;


data want;
 dcl hash h(suminc: "c", dataset:"file");
 h.defineKey('pat_id');
 h.defineDone();
 c = 1;
 do while (not last);
   set file  end=last;
   rc = h.find();
 end;
 last = 0;
 do while (not last);
   set file end=last;
   rc = h.sum(sum: count);
   output;
 end;
 stop;
 keep pat_id drug  count;
run;

Good idea to test the performance. I've added a modified version of your hash solution to make it faster (by reading the input dataset only twice rather than three times):

data want;
dcl hash h(suminc: 'c');
h.defineKey('pat_id');
h.defineDone();
c = 1;
do until(last1);
  set have end=last1;
  h.ref();
end;
do until(last2);
  set have end=last2;
  h.sum(sum: count);
  output;
end;
stop;
drop c;
run;

 

Then I tested the solutions using this input dataset HAVE containing 80 million observations from 12,695,381 "patients" with 1 - 102 observations each (median: 10).

data have;
call streaminit(271828);
do _n_=1 to 80000000;
  drug=rand('normal');
  if drug<-1 then pat_id+1;
  output;
end;
run;

And the winner is (not quite surprisingly) -- @KurtBremser's classic double DOW loop! (Actually, your hash solution uses a double DOW loop as well, that's why I say "classic".)

 

The WANT dataset (with 80 million obs.) was created in

 

  • 11 s using the classic double DOW loop
  • 19 s using PROC SQL
  • 36 s using hash object with SUMINC and REF
  • 50 s using hash object with SUMINC, DATASET and FIND

(with SAS 9.4 TS1M2 64 Bit on a Windows 7 Workstation).

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 24 replies
  • 193 views
  • 8 likes
  • 6 in conversation