- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
there are 10 ids and scores,with varying records for each id. how to get top 3 records for each id based on score?
there is a function "TOP" in other SQL's, what is the equivalent in SAS?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Editor's Note: Thanks to all who contributed, showing that there are mulitple ways to get the same results in SAS.
Here's my take on this. A mix of proc sql and data step.
Alan
data have;
input id $ score @@;
datalines;
S12 -20 S12 20 S12 10
S12 10 S12 20 S12 -50
S13 -40 S13 60 S13 -100
S14 10 S14 20 S14 0
S14 -30 S14 -20
;
/* change as needed to bring back desired ranking */
%let start= 1;
%let end = 3;
/* create table containing distinct id and value -- needed b/c more than one id can have the same score */
proc sql;
create table dst_id_score as
select distinct id, score from have
order by id, score desc; /* keep this sort order for ranking */
* NOTE: Table WORK.DST_ID_SCORE created, with 12 rows and 2 columns. ;
/* assign ranking by id by score */
data temp;
set dst_id_score ;
by id;
if first.id then rnk=0;
rnk + 1;
run;
* NOTE: The data set WORK.TEMP has 12 observations and 3 variables. ;
/* go back to original file and bring back all data meeting desired ranking */
proc sql;
create table final_top_ranked as
select
a.id
, a.score
, b.rnk as rank
from have a inner join temp b on a.id = b.id and a.score = b.score
where rnk between &start and &end
order by id, rnk;
* NOTE: Table WORK.FINAL_TOP_RANKED created, with 11 rows and 3 columns. ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
do _n_=1 by 1 until(last.id);
set have;
by id;
if _n_ <= 3 then output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it relies upon the data being presorted, unlike sql, however the TOP clause itself is indiscriminate about and type of rank or ties, so that shouldn't be an issue to repeat functionality.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If still choose SQL (SAS SQL in this case) and include ties:
data have;
input id $ value;
cards;
S12 -20
S12 20
S12 10
S12 10
S12 20
S12 -50
S13 -40
S13 60
S13 -100
S14 10
S14 20
S14 0
S14 -30
S14 -20
;
proc sql;
select id,value from
(select distinct b.id,b.value, a.value as avalue from have a
left join have b
on a.id=b.id)
group by id,value
having sum(avalue>=value)<=3;
quit;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think that works as intended.
You still haven't marked this question as being answered, thus everyone will still keep trying to come up with solutions. I really think you are best off doing this in either a datastep or using a proc like proc rank. However, since you asked how you might be able to do it using proc sql, here is a proc sql solution that takes ties into consideration (i.e., if there is a tie for the 3rd position, it will provide all records that have a value greater than of equal to the tied value).
Unfortunately, it has to be modified to account for the specific ids of interest. I used Haikuo's example dataset, but added one record for testing purposes:
data have;
input id $ value;
cards;
S12 -20
S12 20
S12 10
S12 10
S12 20
S12 -50
S13 -40
S13 15
S13 60
S13 -100
S14 10
S14 20
S14 0
S14 -30
S14 -20
;
proc sql;
create table temp as
select *
from have
where id="S12"
order by value desc
;
create table S12 as
select id, value as bottom_value
from temp (firstobs=3 obs=3)
;
create table temp as
select *
from have
where id="S13"
order by value desc
;
create table S13 as
select id, value as bottom_value
from temp (firstobs=3 obs=3)
;
create table temp as
select *
from have
where id="S14"
order by value desc
;
create table S14 as
select id, value as bottom_value
from temp (firstobs=3 obs=3)
;
create table criteria as
select * from S12
union all select * from S13
union all select * from S14
;
create table want (drop=count) as
select a.*,count(*) as count
from have a
left join criteria b
on a.id=b.id
group by a.id
having a.value>=b.bottom_value
order by id, value desc
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input id $ value @@;
cards;
S12 -20 S12 20 S12 10
S12 10 S12 20 S12 -50
S13 -40 S13 60 S13 -100
S14 10 S14 20 S14 0
S14 -30 S14 -20
;
proc means data=have nway noprint;
class id;
var value;
output out=top3arr(drop=_:) idgroup ( max(value) out[3] (value)=) /autolabel autoname;
run;
proc transpose data=top3arr out=top3list(drop=_: rename=(col1=value));
by id;
var value:;
run;
proc sql;
drop table top3arr;
quit;
ID | VALUE |
S12 | 20 |
S12 | 20 |
S12 | 10 |
S13 | 60 |
S13 | -40 |
S13 | -100 |
S14 | 20 |
S14 | 10 |
S14 | 0 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
With proc rank you get some control over the way ties are treated :
data have;
input id $ score @@;
datalines;
S12 -20 S12 20 S12 10
S12 10 S12 20 S12 -50
S13 -40 S13 60 S13 -100
S14 10 S14 20 S14 0
S14 -30 S14 -20
;
proc rank data=have descending ties=low /*ties=dense*/
out=want(where=(order<=3));
by id notsorted;
var score; ranks order;
run;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Agree with Art, SQL is not really optimal for this kind of task. Comparing to PG's clean-cut proc rank, Data step will have same amount of control, if not more:
data have;
input id $ score @@;
datalines;
S12 -20 S12 20 S12 10
S12 10 S12 20 S12 -50
S13 -40 S13 60 S13 -100
S14 10 S14 20 S14 0
S14 -30 S14 -20
;
proc sort data=have;
by id descending value;
run;
data want;
do _n_=1 by 1 until (last.id);
set have nobs=nobs;
by id descending value;
if first.id then _f=nobs;
_n+ifn(value=lag(value),0,1);
_n=ifn(first.id,1,_n);
_f=ifn(_n_=3,_n,_f) ;
if _n<=_f then output;
end;
drop _:;
run;
proc print;run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Editor's Note: Thanks to all who contributed, showing that there are mulitple ways to get the same results in SAS.
Here's my take on this. A mix of proc sql and data step.
Alan
data have;
input id $ score @@;
datalines;
S12 -20 S12 20 S12 10
S12 10 S12 20 S12 -50
S13 -40 S13 60 S13 -100
S14 10 S14 20 S14 0
S14 -30 S14 -20
;
/* change as needed to bring back desired ranking */
%let start= 1;
%let end = 3;
/* create table containing distinct id and value -- needed b/c more than one id can have the same score */
proc sql;
create table dst_id_score as
select distinct id, score from have
order by id, score desc; /* keep this sort order for ranking */
* NOTE: Table WORK.DST_ID_SCORE created, with 12 rows and 2 columns. ;
/* assign ranking by id by score */
data temp;
set dst_id_score ;
by id;
if first.id then rnk=0;
rnk + 1;
run;
* NOTE: The data set WORK.TEMP has 12 observations and 3 variables. ;
/* go back to original file and bring back all data meeting desired ranking */
proc sql;
create table final_top_ranked as
select
a.id
, a.score
, b.rnk as rank
from have a inner join temp b on a.id = b.id and a.score = b.score
where rnk between &start and &end
order by id, rnk;
* NOTE: Table WORK.FINAL_TOP_RANKED created, with 11 rows and 3 columns. ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content