how to select top 100 observation a variable by rank

Accepted Solution Solved
Reply
Regular Contributor
Posts: 161
Accepted Solution

how to select top 100 observation a variable by rank

Dear all:

I hope to select the observations with top 100 assets in each year.  May I ask how to do it?

Either by sql or proc rank or proc univariate?

The data looks like:

firmyearasset
11997535
11998453
119997856
2199787
2199887687
21999452
2200078
2199778
31998986

I tried the following, but it results in sort execution failure:

proc sql;

  creassetse table r.want as

  select a.*

  from r.have as a

  left join r.have as b on (a.year=b.year and a.assets <=b.assets)

  group by a.year, a.assets

  having count(*) <= 100

  order by a.year, a.assets desc;

quit;

Thanks !

Message was edited by: Eric Wayne


Accepted Solutions
Solution
‎10-27-2013 08:45 PM
Respected Advisor
Posts: 4,608

Re: how to select top 100 observation a variable by rank

How about SORT and a DATASTEP? Here, for example is how to get the top 2 assets for each year :

data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;

proc sort data=have; by year descending asset; run;

data want;
order = 0;
do until (last.year);
     order + 1;
     set have; by year;
     if order <= 2 then output;
     end;
run;

PG

PG

View solution in original post


All Replies
Solution
‎10-27-2013 08:45 PM
Respected Advisor
Posts: 4,608

Re: how to select top 100 observation a variable by rank

How about SORT and a DATASTEP? Here, for example is how to get the top 2 assets for each year :

data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;

proc sort data=have; by year descending asset; run;

data want;
order = 0;
do until (last.year);
     order + 1;
     set have; by year;
     if order <= 2 then output;
     end;
run;

PG

PG
Regular Contributor
Posts: 161

Re: how to select top 100 observation a variable by rank

Thank you so much !!!

Contributor
Posts: 69

Re: how to select top 100 observation a variable by rank

I believe you should just be able to sort--

proc sort data=dataset out=newdataset(where=(_n_ le 100));

     by asset descending;

run;

_n_ is an automatic variable sas creates as a record ID, so proc sort would put the top 100 at the beginning, and then on the new dataset you would only keep the first 100 records.

Respected Advisor
Posts: 4,608

Re: how to select top 100 observation a variable by rank

, that can't work for two reasons:

450  proc sort data=sashelp.cars(where=(_n_<=3));

ERROR: Variable _n_ is not on file SASHELP.CARS.

451  by make horsepower;

452  run;

First, variable _n_ is created only within a datastep, it is not created during other SAS procedures and it is not saved with datasets.

Second, variable _n_ is not reset at the beginning of each BY group, so it would only identify the first observations from the dataset and not those from each BY group.

PG

PG
Contributor
Posts: 69

Re: how to select top 100 observation a variable by rank

you are correct.

I should have used my other suggestion I was thinking of at the time-- using an obs= option on the to data set-- i actually tested that and it doesn't work either but it was not as obviously wrong and does not produce an error.

Nevertheless, what I just validated to work with only a proc sort is this:

proc sort dataset to=newdataset;

     options obs=#ofObsYouWant;

     by variable;

run;

Respected Advisor
Posts: 4,608

Re: how to select top 100 observation a variable by rank

It doesn't validate for me.

data test;
do grp = 1, 2;
     do id = 1 to 10;
          output;
          end;
     end;
run;

options obs=3;
proc sort data=test out=top3perGrp;
by grp descending id;
run;

dataset top3perGrp does not contain top 3 id from each grp, as required.

PG

PG
Respected Advisor
Posts: 3,124

Re: how to select top 100 observation a variable by rank

Besides it does not work for the requirement like PG pointed out, using system options for this kind of task doesn't seem to be a good practice, programmer has to remember where to set it back to default, because you can't get help from error log when the intended output is not achieved.

Contributor
Posts: 69

Re: how to select top 100 observation a variable by rank

ahh i missed the implied outputting things by top 'x' of each group in the original post, my bad. that makes the obs or _n_ approach both invalid regardless of whether it's a data step or not.

hai.kuo, I agree but if one uses it they need to always write in the options obs=max at the same time.

sorry for the goose chase everyone.

if i were doing this from scratch I'd do a proc sort and then a data step, but instead of PG Stats' do loop I'd use the by statement in the data step, retain a counter variable and then output only when the counter was below your threshhold-- I tend to use the data/by/counter a lot for other things so that's what first pops into mind.

Occasional Contributor
Posts: 13

Re: how to select top 100 observation a variable by rank

data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;
proc sort data=have;
by year ;
run;

data mak;
set have;
by year;
if first.year then Top=0;
Top+1;
if Top < 3 ;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 5229 views
  • 7 likes
  • 5 in conversation