BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Is there a way to create frequency of repetition of duplicates  by certain variable? I know proc freq and merge it back to an original data. But I'm wondering is there any more efficient way doing the same thing? I couldn't figure it out yet using first.obs and last.obs features.

 

data b.file;
input pat_id drug $ count;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Do you mean this?

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;

proc sql;
create table want as
select pat_id,drug, count(pat_id) as count
from file
group by pat_id;
quit;

View solution in original post

24 REPLIES 24
ballardw
Super User

What do you expect to show as a result from that data? It is not at all clear what you want.

novinosrin
Tourmaline | Level 20

Do you mean this?

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;

proc sql;
create table want as
select pat_id,drug, count(pat_id) as count
from file
group by pat_id;
quit;
Cruise
Ammonite | Level 13

@novinosrin

Awesome. And I can keep all the variables since I want to keep the structure of original data, correct. Thanks a lot.

 

proc sql;

create table b.want as

select *, count(pat_id) as count

from b.file

group by pat_id;

quit;

Kurt_Bremser
Super User

With all methods, you have to first create the sums and then merge back. SQL does it transparently, but the overall work by SAS is mostly the same, as is the time that it will take.

If the file is sorted, you can use a double DOW:

data have;
input pat_id drug $ count;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;
run;

data want;
newcount = 0;
do until (last.pat_id);
  set have;
  by pat_id;
  newcount + 1;
end;
do until (last.pat_id);
  set have;
  by pat_id;
  output;
end;
run;
Cruise
Ammonite | Level 13

I just run into this problem not being able to Like the posts. Do you know why?

why.png

FreelanceReinh
Jade | Level 19

@Cruise wrote:

I just run into this problem not being able to Like the posts. Do you know why?

 


This seems to be a general issue. At the moment, I can't give likes either. Not sure if the moderators (@ChrisHemedinger and colleagues) are aware of it.

ChrisHemedinger
Community Manager

Sorry about that -- we had an interloper that was coming in and spamming us with Likes.  I know, seems innocuous, but not a cool behavior.  We clamped down a bit, too much it seems.  But you should be able to Like all you want now.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
FreelanceReinh
Jade | Level 19

The DOW loop technique suggested by Kurt Bremser has an additional advantage over PROC SQL (without a suitable ORDER BY clause): It ensures that observation order is maintained.

 

Cruise
Ammonite | Level 13
Can you give an example of "Not Suitable ORDER BY clause" situation?
FreelanceReinh
Jade | Level 19

@Cruise wrote:
Can you give an example of "Not Suitable ORDER BY clause" situation?

Well, when I run @novinosrin's PROC SQL step, I get this result:

pat_id    drug    count

   3      009       2
   3      648       2
   4      504       2
   4      569       2
   5      003       1

As you see, the order of the first two observations has changed (as compared to the input dataset). Since dataset FILE is not sorted by DRUG within PAT_ID BY groups (neither ascending nor descending), there is no obvious ORDER BY clause which would be suitable to maintain the original sort order (if this was an issue).

Cruise
Ammonite | Level 13
I see. Thanks.
novinosrin
Tourmaline | Level 20

Try this for fun and test the performance against DOW

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;


data want;
 dcl hash h(suminc: "c", dataset:"file");
 h.defineKey('pat_id');
 h.defineDone();
 c = 1;
 do while (not last);
   set file  end=last;
   rc = h.find();
 end;
 last = 0;
 do while (not last);
   set file end=last;
   rc = h.sum(sum: count);
   output;
 end;
 stop;
 keep pat_id drug  count;
run;
Cruise
Ammonite | Level 13

Great solutions going on here while I lost LIKE button. I really want my LIKE button back. Please let me know if someone knows why LIKE is lost?

FreelanceReinh
Jade | Level 19

@novinosrin wrote:

Try this for fun and test the performance against DOW

 

data file;
input pat_id drug $ ;
cards;
3 648 2
3 009 2
4 504 2
4 569 2
5 003 1
;


data want;
 dcl hash h(suminc: "c", dataset:"file");
 h.defineKey('pat_id');
 h.defineDone();
 c = 1;
 do while (not last);
   set file  end=last;
   rc = h.find();
 end;
 last = 0;
 do while (not last);
   set file end=last;
   rc = h.sum(sum: count);
   output;
 end;
 stop;
 keep pat_id drug  count;
run;

Good idea to test the performance. I've added a modified version of your hash solution to make it faster (by reading the input dataset only twice rather than three times):

data want;
dcl hash h(suminc: 'c');
h.defineKey('pat_id');
h.defineDone();
c = 1;
do until(last1);
  set have end=last1;
  h.ref();
end;
do until(last2);
  set have end=last2;
  h.sum(sum: count);
  output;
end;
stop;
drop c;
run;

 

Then I tested the solutions using this input dataset HAVE containing 80 million observations from 12,695,381 "patients" with 1 - 102 observations each (median: 10).

data have;
call streaminit(271828);
do _n_=1 to 80000000;
  drug=rand('normal');
  if drug<-1 then pat_id+1;
  output;
end;
run;

And the winner is (not quite surprisingly) -- @Kurt_Bremser's classic double DOW loop! (Actually, your hash solution uses a double DOW loop as well, that's why I say "classic".)

 

The WANT dataset (with 80 million obs.) was created in

 

  • 11 s using the classic double DOW loop
  • 19 s using PROC SQL
  • 36 s using hash object with SUMINC and REF
  • 50 s using hash object with SUMINC, DATASET and FIND

(with SAS 9.4 TS1M2 64 Bit on a Windows 7 Workstation).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 1647 views
  • 8 likes
  • 6 in conversation