BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
caveman529
Calcite | Level 5

Dear all:

I hope to select the observations with top 100 assets in each year.  May I ask how to do it?

Either by sql or proc rank or proc univariate?

The data looks like:

firmyearasset
11997535
11998453
119997856
2199787
2199887687
21999452
2200078
2199778
31998986

I tried the following, but it results in sort execution failure:

proc sql;

  creassetse table r.want as

  select a.*

  from r.have as a

  left join r.have as b on (a.year=b.year and a.assets <=b.assets)

  group by a.year, a.assets

  having count(*) <= 100

  order by a.year, a.assets desc;

quit;

Thanks !

Message was edited by: Eric Wayne

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

How about SORT and a DATASTEP? Here, for example is how to get the top 2 assets for each year :

data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;

proc sort data=have; by year descending asset; run;

data want;
order = 0;
do until (last.year);
     order + 1;
     set have; by year;
     if order <= 2 then output;
     end;
run;

PG

PG

View solution in original post

9 REPLIES 9
PGStats
Opal | Level 21

How about SORT and a DATASTEP? Here, for example is how to get the top 2 assets for each year :

data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;

proc sort data=have; by year descending asset; run;

data want;
order = 0;
do until (last.year);
     order + 1;
     set have; by year;
     if order <= 2 then output;
     end;
run;

PG

PG
caveman529
Calcite | Level 5

Thank you so much !!!

cau83
Pyrite | Level 9

I believe you should just be able to sort--

proc sort data=dataset out=newdataset(where=(_n_ le 100));

     by asset descending;

run;

_n_ is an automatic variable sas creates as a record ID, so proc sort would put the top 100 at the beginning, and then on the new dataset you would only keep the first 100 records.

PGStats
Opal | Level 21

, that can't work for two reasons:

450  proc sort data=sashelp.cars(where=(_n_<=3));

ERROR: Variable _n_ is not on file SASHELP.CARS.

451  by make horsepower;

452  run;

First, variable _n_ is created only within a datastep, it is not created during other SAS procedures and it is not saved with datasets.

Second, variable _n_ is not reset at the beginning of each BY group, so it would only identify the first observations from the dataset and not those from each BY group.

PG

PG
cau83
Pyrite | Level 9

you are correct.

I should have used my other suggestion I was thinking of at the time-- using an obs= option on the to data set-- i actually tested that and it doesn't work either but it was not as obviously wrong and does not produce an error.

Nevertheless, what I just validated to work with only a proc sort is this:

proc sort dataset to=newdataset;

     options obs=#ofObsYouWant;

     by variable;

run;

PGStats
Opal | Level 21

It doesn't validate for me.

data test;
do grp = 1, 2;
     do id = 1 to 10;
          output;
          end;
     end;
run;

options obs=3;
proc sort data=test out=top3perGrp;
by grp descending id;
run;

dataset top3perGrp does not contain top 3 id from each grp, as required.

PG

PG
Haikuo
Onyx | Level 15

Besides it does not work for the requirement like PG pointed out, using system options for this kind of task doesn't seem to be a good practice, programmer has to remember where to set it back to default, because you can't get help from error log when the intended output is not achieved.

cau83
Pyrite | Level 9

ahh i missed the implied outputting things by top 'x' of each group in the original post, my bad. that makes the obs or _n_ approach both invalid regardless of whether it's a data step or not.

hai.kuo, I agree but if one uses it they need to always write in the options obs=max at the same time.

sorry for the goose chase everyone.

if i were doing this from scratch I'd do a proc sort and then a data step, but instead of PG Stats' do loop I'd use the by statement in the data step, retain a counter variable and then output only when the counter was below your threshhold-- I tend to use the data/by/counter a lot for other things so that's what first pops into mind.

Prashant_Ph
Fluorite | Level 6

data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;
proc sort data=have;
by year ;
run;

data mak;
set have;
by year;
if first.year then Top=0;
Top+1;
if Top < 3 ;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 24838 views
  • 7 likes
  • 5 in conversation