Dear all:
I hope to select the observations with top 100 assets in each year. May I ask how to do it?
Either by sql or proc rank or proc univariate?
The data looks like:
firm | year | asset |
---|---|---|
1 | 1997 | 535 |
1 | 1998 | 453 |
1 | 1999 | 7856 |
2 | 1997 | 87 |
2 | 1998 | 87687 |
2 | 1999 | 452 |
2 | 2000 | 78 |
2 | 1997 | 78 |
3 | 1998 | 986 |
I tried the following, but it results in sort execution failure:
proc sql;
creassetse table r.want as
select a.*
from r.have as a
left join r.have as b on (a.year=b.year and a.assets <=b.assets)
group by a.year, a.assets
having count(*) <= 100
order by a.year, a.assets desc;
quit;
Thanks !
Message was edited by: Eric Wayne
How about SORT and a DATASTEP? Here, for example is how to get the top 2 assets for each year :
data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;
proc sort data=have; by year descending asset; run;
data want;
order = 0;
do until (last.year);
order + 1;
set have; by year;
if order <= 2 then output;
end;
run;
PG
How about SORT and a DATASTEP? Here, for example is how to get the top 2 assets for each year :
data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;
proc sort data=have; by year descending asset; run;
data want;
order = 0;
do until (last.year);
order + 1;
set have; by year;
if order <= 2 then output;
end;
run;
PG
Thank you so much !!!
I believe you should just be able to sort--
proc sort data=dataset out=newdataset(where=(_n_ le 100));
by asset descending;
run;
_n_ is an automatic variable sas creates as a record ID, so proc sort would put the top 100 at the beginning, and then on the new dataset you would only keep the first 100 records.
, that can't work for two reasons:
450 proc sort data=sashelp.cars(where=(_n_<=3));
ERROR: Variable _n_ is not on file SASHELP.CARS.
451 by make horsepower;
452 run;
First, variable _n_ is created only within a datastep, it is not created during other SAS procedures and it is not saved with datasets.
Second, variable _n_ is not reset at the beginning of each BY group, so it would only identify the first observations from the dataset and not those from each BY group.
PG
you are correct.
I should have used my other suggestion I was thinking of at the time-- using an obs= option on the to data set-- i actually tested that and it doesn't work either but it was not as obviously wrong and does not produce an error.
Nevertheless, what I just validated to work with only a proc sort is this:
proc sort dataset to=newdataset;
options obs=#ofObsYouWant;
by variable;
run;
It doesn't validate for me.
data test;
do grp = 1, 2;
do id = 1 to 10;
output;
end;
end;
run;
options obs=3;
proc sort data=test out=top3perGrp;
by grp descending id;
run;
dataset top3perGrp does not contain top 3 id from each grp, as required.
PG
Besides it does not work for the requirement like PG pointed out, using system options for this kind of task doesn't seem to be a good practice, programmer has to remember where to set it back to default, because you can't get help from error log when the intended output is not achieved.
ahh i missed the implied outputting things by top 'x' of each group in the original post, my bad. that makes the obs or _n_ approach both invalid regardless of whether it's a data step or not.
hai.kuo, I agree but if one uses it they need to always write in the options obs=max at the same time.
sorry for the goose chase everyone.
if i were doing this from scratch I'd do a proc sort and then a data step, but instead of PG Stats' do loop I'd use the by statement in the data step, retain a counter variable and then output only when the counter was below your threshhold-- I tend to use the data/by/counter a lot for other things so that's what first pops into mind.
data have;
input firm year asset;
datalines;
1 1997 535
1 1998 453
1 1999 7856
2 1997 87
2 1998 87687
2 1999 452
2 2000 78
2 1997 78
3 1998 986
;
proc sort data=have;
by year ;
run;
data mak;
set have;
by year;
if first.year then Top=0;
Top+1;
if Top < 3 ;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.