Hi,
My code below helps to mark the top 3 value for each ID.
However, as you see, SAS will assign top number for every rows which is unnecessary.
For the sake of curiosity and efficiency, is there any way to tell SAS stop assigning value after top=3?
Thank you,
HHC
data have;
input ID value ;
cards;
1 5
1 8
1 2
1 0
2 6
2 10
2 9
2 1
2 5
2 33
run;
proc sort data=have;
by id descending value;
run;
data want; set have;
by id;
if first.id then top=0;
top+1;
retain top;
run;
I don't think that avoiding proc sort improves the runtime. I got the following times, when have had 1000000 obs (100k ids, 10 obs each):
proc summary data=have nway;
class id;
var value;
output out= work.top max= idgroup(max(value) out[3] (value)=maxvalue );
run;
proc transpose data=work.top(keep= id maxvalue:) out=work.transposed(rename=(col1 = value)) name=position;
by id;
var maxvalue:;
run;
data want;
set have;
if _n_ = 1 then do;
if 0 then set work.transposed;
declare hash h(dataset: 'work.transposed');
h.defineKey('id', 'value');
h.defineData('position');
h.defineDone();
end;
if h.find() = 0 then do;
top = input(compress(position,, 'kd'), 1.);
end;
drop position;
run;
How about this?
data have;
input ID value ;
cards;
1 5
1 8
1 2
1 0
2 6
2 10
2 9
2 1
2 5
2 33
run;
proc sort data=have;
by id descending value;
run;
data want; set have;
by id;
if first.id then top = 0;
if 0 <= top < 3 then top + 1;
else top = .;
run;
As @PeterClemmensen remarked, if you have performance problems, it may help doing things without the proc sort.
Here is a method that works without sorting the data.
First, generate the top 3 records for each id, with an index:
data top3 (index=(idx=(ID value)));
set have;
by ID;
array top4(4) 8 _temporary_;
if first.id then
call missing(of top4(*));
if value>top4(1) then do;
top4(1)=value;
call sortn(of top4(*));
end;
if last.id;
n=n(of top4(*));
do _N_=max(n-2,1) to min(n,4);
value=top4(_N_);
top=n-_N_+1;
output;
end;
keep id value top;
run;
Then use that to get the data you want:
data want;
set have;
top=.;
set top3 key=idx/unique;
if _iorc_ then
_error_=0;
run;
This may run faster if it is the sort that takes time.
You did not specify what to do with ties. This method assigns the same top value to ties, unlike your original program which assigns different numbers. So if you have data like
data have;
input ID value ;
cards;
1 5
1 8
1 2
1 2
1 0
;run;
the output from this program will look like
data have;
input ID value top;
cards;
1 5 2
1 8 1
1 2 3
1 2 3
1 0 .
;run;
while the output from your program will be
data have;
input ID value top;
cards;
1 8 1
1 5 2
1 2 3
1 2 4
1 0 5
;run;
- if you do not set TOP to missing after the first 3, that is. But doing that will hardly save any CPU time.
For curiosity, Yes. For efficiency, No 🙂 There will be virtually no gain in efficiency from that.
If you're looking to speed things up or other measures of better performance when doing this, you should try to develop a method without the Sort Procedure 🙂
I don't think that avoiding proc sort improves the runtime. I got the following times, when have had 1000000 obs (100k ids, 10 obs each):
proc summary data=have nway;
class id;
var value;
output out= work.top max= idgroup(max(value) out[3] (value)=maxvalue );
run;
proc transpose data=work.top(keep= id maxvalue:) out=work.transposed(rename=(col1 = value)) name=position;
by id;
var maxvalue:;
run;
data want;
set have;
if _n_ = 1 then do;
if 0 then set work.transposed;
declare hash h(dataset: 'work.transposed');
h.defineKey('id', 'value');
h.defineData('position');
h.defineDone();
end;
if h.find() = 0 then do;
top = input(compress(position,, 'kd'), 1.);
end;
drop position;
run;
Thank you all for an interesting discussion.
HHC
data want; set have;
by id;
if first.id then do;
n=1;
top=0;
end;
if n<=3 then top+1;
else top=.;
n+1;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.