gvkey | fy | TA |
1700 | 1995 | 3311.662 |
1700 | 1995 | 1599.234 |
1700 | 1995 | 564.76 |
1700 | 1995 | 560.047 |
1700 | 1995 | 656.226 |
1700 | 1996 | 577.841 |
1700 | 1996 | 1696.431 |
1700 | 1996 | 1081.963 |
1700 | 1996 | 1727.069 |
1700 | 1996 | 4182.832 |
1700 | 1996 | 2068.554 |
1700 | 1996 | 841.204 |
1700 | 1996 | . |
84578 | 1995 | 1312.852 |
84578 | 1995 | 1131.176 |
84578 | 1995 | 3700.925 |
84578 | 1995 | 1099.786 |
84578 | 1995 | 848.657 |
84578 | 1995 | 1067.566 |
The above is my data and I want to get the following result.
gvkey | fy | TA | Sum |
1700 | 1995 | 3311.662 | 6691.929 |
1700 | 1995 | 1599.234 | 6691.929 |
1700 | 1995 | 656.226 | 6691.929 |
1700 | 1995 | 564.76 | 6691.929 |
1700 | 1995 | 560.047 | 6691.929 |
1700 | 1996 | 4182.832 | 10756.85 |
1700 | 1996 | 2068.554 | 10756.85 |
1700 | 1996 | 1727.069 | 10756.85 |
1700 | 1996 | 1696.431 | 10756.85 |
1700 | 1996 | 1081.963 | 10756.85 |
1700 | 1996 | 841.204 | . |
1700 | 1996 | 577.841 | . |
1700 | 1996 | . | . |
84578 | 1995 | 3700.925 | 8312.305 |
84578 | 1995 | 1312.852 | 8312.305 |
84578 | 1995 | 1131.176 | 8312.305 |
84578 | 1995 | 1099.786 | 8312.305 |
84578 | 1995 | 1067.566 | 8312.305 |
84578 | 1995 | 848.657 | . |
I want to get the sum of the first 5 highest observations. I have used the following code.
proc sort data= have;
by gvkey fy decending TA; run;
Proc sql;
create table want as
select *,sum(TA) as sum
from have
group by gvkey, fy;
quit;
I get the sum but if I have more than 5 observations in the TA column, I get sum for all of the observations. I want the sum for only the first 5 highest observations. And most importantly I don't want to lose any other rows also. I need to have all the rows in my output. Can anyone please help?
Hi @abdulla
data have;
input gvkey fy TA ;
cards;
1311 1995 3311.662
1311 1995 1599.234
1311 1995 564.76
1311 1995 560.047
1311 1995 656.226
1700 1996 577.841
1700 1996 1696.431
1700 1996 1081.963
1700 1996 1727.069
1700 1996 4182.832
1700 1996 2068.554
1700 1996 841.204
1700 1996 .
84578 1995 1312.852
84578 1995 1131.176
84578 1995 3700.925
84578 1995 1099.786
84578 1995 848.657
84578 1995 1067.566
;
proc sort data= have;
by gvkey fy decending TA;
run;
data want;
do _n_=1 by 1 until(last.gvkey);
set have;
by gvkey;
if _n_<=5 then sum=sum(sum,ta);
end;
do _n_=1 to _n_;
set have;
if _n_>5 then call missing(sum);
output;
end;
run;
Hi @abdulla
data have;
input gvkey fy TA ;
cards;
1311 1995 3311.662
1311 1995 1599.234
1311 1995 564.76
1311 1995 560.047
1311 1995 656.226
1700 1996 577.841
1700 1996 1696.431
1700 1996 1081.963
1700 1996 1727.069
1700 1996 4182.832
1700 1996 2068.554
1700 1996 841.204
1700 1996 .
84578 1995 1312.852
84578 1995 1131.176
84578 1995 3700.925
84578 1995 1099.786
84578 1995 848.657
84578 1995 1067.566
;
proc sort data= have;
by gvkey fy decending TA;
run;
data want;
do _n_=1 by 1 until(last.gvkey);
set have;
by gvkey;
if _n_<=5 then sum=sum(sum,ta);
end;
do _n_=1 to _n_;
set have;
if _n_>5 then call missing(sum);
output;
end;
run;
Did you try to understand the solution given?
You just have to modify in a couple of places.
Please use this forum as an opportunity to learn.
data WANT;
do _N_=1 by 1 until(last.FY);
set HAVE;
by GVKEY FY;
if _N_ <= 5 then SUM=sum(SUM,TA);
end;
do _N_=1 to _N_;
set HAVE;
if _N_ > 5 then call missing(SUM);
output;
end;
run;
gvkey | fy | TA | SUM |
---|---|---|---|
1311 | 1995 | 3311.66 | 6691.93 |
1311 | 1995 | 1599.23 | 6691.93 |
1311 | 1995 | 656.23 | 6691.93 |
1311 | 1995 | 564.76 | 6691.93 |
1311 | 1995 | 560.05 | 6691.93 |
1700 | 1996 | 4182.83 | 10756.85 |
1700 | 1996 | 2068.55 | 10756.85 |
1700 | 1996 | 1727.07 | 10756.85 |
1700 | 1996 | 1696.43 | 10756.85 |
1700 | 1996 | 1081.96 | 10756.85 |
1700 | 1996 | 841.20 | . |
1700 | 1996 | 577.84 | . |
1700 | 1996 | . | . |
84578 | 1995 | 3700.93 | 8312.31 |
84578 | 1995 | 1312.85 | 8312.31 |
84578 | 1995 | 1131.18 | 8312.31 |
84578 | 1995 | 1099.79 | 8312.31 |
84578 | 1995 | 1067.57 | 8312.31 |
84578 | 1995 | 848.66 | . |
You could use Proc Rank to also deal with ties if any.
proc rank data=have out=inter descending ties=dense ;
ranks rank;
by gvkey fy;
var ta;
run;
proc sql;
/* create table want as*/
select
gvkey,
fy,
TA,
sum(ta) as sum_ta
from inter
where rank<=5
group by
gvkey,
fy
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.