BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hi,

My code below helps to mark the top 3 value for each ID.

However, as you see, SAS will assign top number for every rows which is unnecessary.

For the sake of curiosity and efficiency, is there any way to tell SAS stop assigning value after top=3?

Thank you,

HHC

 

data have;
  input  ID value  ;
cards;
1 5
1 8
1 2
1 0
2 6
2 10
2 9
2 1 
2 5
2 33
run;

proc sort data=have; 
by id descending value;
run;

data want; set have;
by id;
if first.id then top=0;
top+1;
retain top;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

I don't think that avoiding proc sort improves the runtime. I got the following times, when have had 1000000 obs (100k ids, 10 obs each):

original code: duration=0.265001
suggestion by @s_lassen: duration=1.282
summary+transpose+hash: duration=0.859
 
Code of the hash-approach:
proc summary data=have nway;
   class id;
   var value;
   output out= work.top max= idgroup(max(value) out[3] (value)=maxvalue );
run;

proc transpose data=work.top(keep= id maxvalue:) out=work.transposed(rename=(col1 = value)) name=position;
   by id;
   var maxvalue:;
run;

data want;
   set have;
   
   if _n_ = 1 then do;
      if 0 then set work.transposed;
      declare hash h(dataset: 'work.transposed');
      h.defineKey('id', 'value');
      h.defineData('position');
      h.defineDone();
   end;
   
   if h.find() = 0 then do;
      top = input(compress(position,, 'kd'), 1.);
   end;
   
   drop position;
run;

View solution in original post

6 REPLIES 6
SASKiwi
PROC Star

How about this?

data have;
  input  ID value  ;
cards;
1 5
1 8
1 2
1 0
2 6
2 10
2 9
2 1 
2 5
2 33
run;

proc sort data=have; 
by id descending value;
run;

data want; set have;
by id;
if first.id then top = 0;
if 0 <= top < 3 then top + 1;
else top = .;
run;
s_lassen
Meteorite | Level 14

As @PeterClemmensen remarked, if you have performance problems, it may help doing things without the proc sort.

 

Here is a method that works without sorting the data.

 

First, generate the top 3 records for each id, with an index:

data top3 (index=(idx=(ID value)));
  set have;
  by ID;
  array top4(4) 8 _temporary_;
  if first.id then 
    call missing(of top4(*));
  if value>top4(1) then do;
    top4(1)=value;
    call sortn(of top4(*));
    end;
  if last.id;
  n=n(of top4(*));
  do _N_=max(n-2,1) to min(n,4);
    value=top4(_N_);
    top=n-_N_+1;
    output;
    end;
  keep id value top;
run;

Then use that to get the data you want:

data want;
  set have;
  top=.;
  set top3 key=idx/unique;
  if _iorc_ then 
    _error_=0;
run;

This may run faster if it is the sort that takes time. 

 

You did not specify what to do with ties. This method assigns the same top value to ties, unlike your original program which assigns different numbers. So if you have data like

data have;
  input  ID value  ;
cards;
1 5
1 8
1 2
1 2
1 0 ;run;

the output from this program will look like

data have;
  input  ID value  top;
cards;
1 5 2
1 8 1
1 2 3
1 2 3
1 0 . ;run;

while the output from your program will be

data have;
  input  ID value  top;
cards;
1 8 1
1 5 2
1 2 3
1 2 4
1 0 5
;run;

- if you do not set TOP to missing after the first 3, that is. But doing that will hardly save any CPU time.

PeterClemmensen
Tourmaline | Level 20

For curiosity, Yes. For efficiency, No 🙂 There will be virtually no gain in efficiency from that.

 

If you're looking to speed things up or other measures of better performance when doing this, you should try to develop a method without the Sort Procedure 🙂

andreas_lds
Jade | Level 19

I don't think that avoiding proc sort improves the runtime. I got the following times, when have had 1000000 obs (100k ids, 10 obs each):

original code: duration=0.265001
suggestion by @s_lassen: duration=1.282
summary+transpose+hash: duration=0.859
 
Code of the hash-approach:
proc summary data=have nway;
   class id;
   var value;
   output out= work.top max= idgroup(max(value) out[3] (value)=maxvalue );
run;

proc transpose data=work.top(keep= id maxvalue:) out=work.transposed(rename=(col1 = value)) name=position;
   by id;
   var maxvalue:;
run;

data want;
   set have;
   
   if _n_ = 1 then do;
      if 0 then set work.transposed;
      declare hash h(dataset: 'work.transposed');
      h.defineKey('id', 'value');
      h.defineData('position');
      h.defineDone();
   end;
   
   if h.find() = 0 then do;
      top = input(compress(position,, 'kd'), 1.);
   end;
   
   drop position;
run;

hhchenfx
Barite | Level 11

Thank you all for an interesting discussion.

HHC

hhchenfx
Barite | Level 11
data want; set have;
by id;
if first.id then do;
	n=1;
	top=0;
	end;
if n<=3 then top+1;
else top=.;
n+1;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 501 views
  • 5 likes
  • 5 in conversation