BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abdulla
Pyrite | Level 9
gvkeyfyTA
170019953311.662
170019951599.234
17001995564.76
17001995560.047
17001995656.226
17001996577.841
170019961696.431
170019961081.963
170019961727.069
170019964182.832
170019962068.554
17001996841.204
17001996.
8457819951312.852
8457819951131.176
8457819953700.925
8457819951099.786
845781995848.657
8457819951067.566

The above is my data and I want to get the following result. 

gvkeyfyTASum
170019953311.6626691.929
170019951599.2346691.929
17001995656.2266691.929
17001995564.766691.929
17001995560.0476691.929
170019964182.83210756.85
170019962068.55410756.85
170019961727.06910756.85
170019961696.43110756.85
170019961081.96310756.85
17001996841.204.
17001996577.841.
17001996..
8457819953700.9258312.305
8457819951312.8528312.305
8457819951131.1768312.305
8457819951099.7868312.305
8457819951067.5668312.305
845781995848.657.

 

I want to get the sum of the first 5 highest observations. I have used the following code. 

 

proc sort data= have;
by gvkey fy decending TA; run;


Proc sql;
create table want  as
select *,sum(TA) as sum
from have
group by gvkey, fy;
quit;

 

I get the sum but if I have more than 5 observations in the TA column, I get sum for all of the observations. I want the sum for only the first 5 highest observations. And most importantly I don't want to lose any other rows also. I need to have all the rows in my output. Can anyone please help?

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @abdulla 

 

data have;
input gvkey	fy	TA ;
cards;
1311	1995	3311.662
1311	1995	1599.234
1311	1995	564.76
1311	1995	560.047
1311	1995	656.226
1700	1996	577.841
1700	1996	1696.431
1700	1996	1081.963
1700	1996	1727.069
1700	1996	4182.832
1700	1996	2068.554
1700	1996	841.204
1700	1996	.
84578	1995	1312.852
84578	1995	1131.176
84578	1995	3700.925
84578	1995	1099.786
84578	1995	848.657
84578	1995	1067.566
;

proc sort data= have;
by gvkey fy decending TA;
run;

data want;
 do _n_=1 by 1 until(last.gvkey);
  set have;
  by gvkey;
  if _n_<=5 then sum=sum(sum,ta);
 end;
 do _n_=1 to _n_;
  set have;
  if _n_>5 then call missing(sum);
  output;
 end;
run;


View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

Hi @abdulla 

 

data have;
input gvkey	fy	TA ;
cards;
1311	1995	3311.662
1311	1995	1599.234
1311	1995	564.76
1311	1995	560.047
1311	1995	656.226
1700	1996	577.841
1700	1996	1696.431
1700	1996	1081.963
1700	1996	1727.069
1700	1996	4182.832
1700	1996	2068.554
1700	1996	841.204
1700	1996	.
84578	1995	1312.852
84578	1995	1131.176
84578	1995	3700.925
84578	1995	1099.786
84578	1995	848.657
84578	1995	1067.566
;

proc sort data= have;
by gvkey fy decending TA;
run;

data want;
 do _n_=1 by 1 until(last.gvkey);
  set have;
  by gvkey;
  if _n_<=5 then sum=sum(sum,ta);
 end;
 do _n_=1 to _n_;
  set have;
  if _n_>5 then call missing(sum);
  output;
 end;
run;


abdulla
Pyrite | Level 9
I need it by fy too. Now I am getting a lot of missing for fy.
I have data like the following.
gvkey fy
1311 1995
1311 1995
1311 1995
1311 1995
1311 1995
1311 1996
1311 1996
1311 1996
1311 1996
1311 1996
1311 1996
1311 1996
1311 1996
1311 1997
1311 1997
1311 1997
1311 1997
1311 1997
1311 1997
ChrisNZ
Tourmaline | Level 20

Did you try to understand the solution given?

You just have to modify in a couple of places.

Please use this forum as an opportunity to learn.

data WANT;
 do _N_=1 by 1 until(last.FY);
  set HAVE;
  by GVKEY FY;
  if _N_ <= 5 then SUM=sum(SUM,TA);
 end;
 do _N_=1 to _N_;
  set HAVE;
  if _N_ > 5 then call missing(SUM);
  output;
 end;
run;
 
gvkey fy TA SUM
1311 1995 3311.66 6691.93
1311 1995 1599.23 6691.93
1311 1995 656.23 6691.93
1311 1995 564.76 6691.93
1311 1995 560.05 6691.93
1700 1996 4182.83 10756.85
1700 1996 2068.55 10756.85
1700 1996 1727.07 10756.85
1700 1996 1696.43 10756.85
1700 1996 1081.96 10756.85
1700 1996 841.20 .
1700 1996 577.84 .
1700 1996 . .
84578 1995 3700.93 8312.31
84578 1995 1312.85 8312.31
84578 1995 1131.18 8312.31
84578 1995 1099.79 8312.31
84578 1995 1067.57 8312.31
84578 1995 848.66 .

 

 

 

Patrick
Opal | Level 21

You could use Proc Rank to also deal with ties if any.

proc rank data=have out=inter descending ties=dense ;
  ranks rank;
  by gvkey fy;
  var ta;
run;

proc sql;
/*  create table want as*/
    select 
      gvkey, 
      fy,
      TA, 
      sum(ta) as sum_ta
    from inter
    where rank<=5
    group by 
      gvkey, 
      fy 
    ;
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 843 views
  • 4 likes
  • 4 in conversation