BookmarkSubscribeRSS Feed
d0816
Quartz | Level 8

Hello,

I have following sample data which I sorted by company and Commission (high to low) (original dataset is more than 3000 rows). For each of the company, I want to take top 20% commission data and calculate average of that top 20% only, (not all data for a company) with a flexibility to change the top n%. How can I do this?

CompanyCommission
I$17.75
I$15.00
I$15.00
I$13.00
I$12.50
I$10.00
I$10.00
I$9.89
I$9.15
I$9.00
I$9.00
I$8.75
I$8.50
I$8.00
I$7.25
F$15.50
F$14.00
F$14.00
F$13.00
F$13.00
F$12.50
F$12.00
F$12.00
F$12.00
F$11.50
F$11.50
F$11.50
F$11.00
F$10.50
F$10.45
F$10.00
F$10.00
F$10.00
F$9.00
F 
M$17.29
M$15.91
M$14.25
M$13.99
M$13.55
M$13.54
M$13.50
M$12.50
M$12.06
M$11.50
M$11.31
M$11.25
M$11.15
M$10.00
M$10.00
M$10.00
M$10.00
M$9.00
M$8.00
M$7.80
M$7.25
R$18.29
R$15.60
R$13.00
R$11.00
R$11.00
R$11.00
R$10.00
R$10.00
R$9.56
R$9.00
R$0.14
R 
RO$18.54
RO$16.75
RO$15.75
RO$15.25
RO$14.00
RO$13.25
RO$13.00
RO$13.00
RO$12.50
RO$10.50
RO$10.00
RO$10.00
RO$9.95
RO$9.25
RO$9.25
RO$9.00
U$15.00
U$13.90
U$12.00
U$12.00
U$11.50
U$11.07
U$11.07
U$11.00
U$10.00
U$10.00
U$9.25
U$2.33
W$15.00
W$12.00
W$9.50

 

For e.g. Company F has 20 rows, top 20% would be first 4 rows (because it is already sorted by commission for a company) and calculating average of those 4 commission data.

CompanyCommission
F$15.50
F$14.00
F$14.00
F$13.00

 

Result wanted:

CompanyTop n% Average Commission
I 
F$14.13
M 
R 
RO 
U 
W 

 

Appreciate any suggestion.

20 REPLIES 20
PaigeMiller
Diamond | Level 26

You find the number of observations (with non-missing value for commission) for each company, then you can figure out which are the top 20% (ignoring ties and ignoring that you can't get exactly 20% unless the number of observations for a company is a multiple of 5), then you take the mean.

 

proc freq data=have(where=(not missing(commission)));
    table company/noprint out=_counts_;
run;
data want;
    merge have(where=(not missing(commission))) _counts_;
    by company;
    if first.company then sequence=0;
    sequence+1;
    if sequence/count<=0.2 then output;
run;
proc means data=want;
    by company;
    var commission;
run;
    

 Also, for company W which has only 3 observations, this method returns nothing. 

--
Paige Miller
Jagadishkatam
Amethyst | Level 16

Please try the below code

 

data have;
input Company$	Commission:dollar6.;
cards;
I $17.75
I $15.00
I $15.00
I $13.00
I $12.50
I $10.00
I $10.00
I $9.89
I $9.15
I $9.00
I $9.00
I $8.75
I $8.50
I $8.00
I $7.25
F $15.50
F $14.00
F $14.00
F $13.00
F $13.00
F $12.50
F $12.00
F $12.00
F $12.00
F $11.50
F $11.50
F $11.50
F $11.00
F $10.50
F $10.45
F $10.00
F $10.00
F $10.00
F $9.00
;

proc sort data=have;
by company descending Commission ;
run;

data have2;
set have;
by company;
retain row;
if first.company then row=1;
else row+1;
run;

proc sql;
create table want as select company, avg(Commission) as avg from (select company, Commission,row, round((count(*)*20)/100,1) as cnt from have2 group by company having row<=calculated cnt) group by company;
quit;
Thanks,
Jag
KachiM
Rhodochrosite | Level 12

@d0816 

 

You have two answers. Just for fun I give another One Step Data Step. Your Data is presorted as you have claimed. Assuming your Data Set is named as 'HAVE' here comes:

data want;
   do _n = 1 by 1 until(last.Company);
      set have;
      by Company;
   end;
   sampsize = ceil(_n * 0.20);
   sum = 0;
   do _n = 1 by 1 until(last.Company);
      set have;
      by Company;
      if _n <= sampsize then sum + Commission;
      if last.Company then mean_Commision = sum / sampsize;
   end;
drop Commission;
run;

I have not dropped the intermediate variables used in the program for the sake giving you additional information. Here is the output:

 

obs 	_n 	Company 	sampsize 	sum 	mean_Commision
1 	15 	I 	3 	47.70 	15.9000
2 	19 	F 	4 	56.50 	14.1250
3 	21 	M 	5 	74.70 	14.9400
4 	11 	R 	3 	46.80 	15.6000
5 	16 	RO 	4 	66.29 	16.5725
6 	12 	U 	3 	40.90 	13.6333
7 	3 	W 	1 	15.00 	15.0000
PaigeMiller
Diamond | Level 26

The answers from @KachiM and @Jagadishkatam don't take into account that some of the original data has missing commission values, and so will give the wrong answer.

--
Paige Miller
KachiM
Rhodochrosite | Level 12

@PaigeMiller 

 

The data step program does not change.  The sampsize is computed using CEIL() function. The OP may change to FLOOR() or ROUND(). 

 

Using the two missing values for F and R using the OP's Data Set, the new ouput is:

 

bs 	_n 	Company 	sampsize 	sum 	mean_Commision
1 	20 	F 	4 	56.50 	14.1250
2 	15 	I 	3 	47.70 	15.9000
3 	21 	M 	5 	74.70 	14.9400
4 	12 	R 	3 	46.80 	15.6000
5 	16 	RO 	4 	66.29 	16.5725
6 	12 	U 	3 	40.90 	13.6333
7 	3 	W 	1 	15.00 	15.0000
PaigeMiller
Diamond | Level 26

The variable _n is calculated including missing values. So sampsize is wrong, and the rest of the calculations are wrong.

--
Paige Miller
d0816
Quartz | Level 8

Hi @hashman,Thank you for replying. I am trying the code and figuring out how it is working.

 

About "The OP may change to FLOOR() or ROUND()", when should FLOOR() or ROUND() be used?

 

 

 

Jagadishkatam
Amethyst | Level 16

Thank you @PaigeMiller 

 

I believe the below subset condition will help to get the correct answer with my code.

 

proc sort data=have;
by company descending Commission ;
where Commission ne .;
run;

 

Thanks,
Jag
hashman
Ammonite | Level 13

@PaigeMiller:

It's easy to take into account by not reading the rows where Commission is missing in the first place:

data have ;                                                                                                                                                                                                                                                     
  input Company :$1. Commission ;                                                                                                                                                                                                                               
  cards ;                                                                                                                                                                                                                                                       
F      15.50                                                                                                                                                                                                                                                    
F      14.00                                                                                                                                                                                                                                                    
F      14.00                                                                                                                                                                                                                                                    
F      13.00                                                                                                                                                                                                                                                    
F      13.00                                                                                                                                                                                                                                                    
F      12.50                                                                                                                                                                                                                                                    
F      12.00                                                                                                                                                                                                                                                    
F      12.00                                                                                                                                                                                                                                                    
F      12.00                                                                                                                                                                                                                                                    
F      11.50                                                                                                                                                                                                                                                    
F      11.50                                                                                                                                                                                                                                                    
F      11.50                                                                                                                                                                                                                                                    
F      11.00                                                                                                                                                                                                                                                    
F      10.50                                                                                                                                                                                                                                                    
F      10.45                                                                                                                                                                                                                                                    
F      10.00                                                                                                                                                                                                                                                    
F      10.00                                                                                                                                                                                                                                                    
F      10.00                                                                                                                                                                                                                                                    
F       9.00                                                                                                                                                                                                                                                    
F        .                                                                                                                                                                                                                                                      
I      17.75                                                                                                                                                                                                                                                    
I      15.00                                                                                                                                                                                                                                                    
I      15.00                                                                                                                                                                                                                                                    
I      13.00                                                                                                                                                                                                                                                    
I      12.50                                                                                                                                                                                                                                                    
I      10.00                                                                                                                                                                                                                                                    
I      10.00                                                                                                                                                                                                                                                    
I       9.89                                                                                                                                                                                                                                                    
I       9.15                                                                                                                                                                                                                                                    
I       9.00                                                                                                                                                                                                                                                    
I       9.00                                                                                                                                                                                                                                                    
I       8.75                                                                                                                                                                                                                                                    
I       8.50                                                                                                                                                                                                                                                    
I       8.00                                                                                                                                                                                                                                                    
I       7.25                                                                                                                                                                                                                                                    
M      17.29                                                                                                                                                                                                                                                    
M      15.91                                                                                                                                                                                                                                                    
M      14.25                                                                                                                                                                                                                                                    
M      13.99                                                                                                                                                                                                                                                    
M      13.55                                                                                                                                                                                                                                                    
M      13.54                                                                                                                                                                                                                                                    
M      13.50                                                                                                                                                                                                                                                    
M      12.50                                                                                                                                                                                                                                                    
M      12.06                                                                                                                                                                                                                                                    
M      11.50                                                                                                                                                                                                                                                    
M      11.31                                                                                                                                                                                                                                                    
M      11.25                                                                                                                                                                                                                                                    
M      11.15                                                                                                                                                                                                                                                    
M      10.00                                                                                                                                                                                                                                                    
M      10.00                                                                                                                                                                                                                                                    
M      10.00                                                                                                                                                                                                                                                    
M      10.00                                                                                                                                                                                                                                                    
M       9.00                                                                                                                                                                                                                                                    
M       8.00                                                                                                                                                                                                                                                    
M       7.80                                                                                                                                                                                                                                                    
M       7.25                                                                                                                                                                                                                                                    
R      18.29                                                                                                                                                                                                                                                    
R      15.60                                                                                                                                                                                                                                                    
R      13.00                                                                                                                                                                                                                                                    
R      11.00                                                                                                                                                                                                                                                    
R      11.00                                                                                                                                                                                                                                                    
R      11.00                                                                                                                                                                                                                                                    
R      10.00                                                                                                                                                                                                                                                    
R      10.00                                                                                                                                                                                                                                                    
R      9.56                                                                                                                                                                                                                                                     
R      9.00                                                                                                                                                                                                                                                     
R      0.14                                                                                                                                                                                                                                                     
R       .                                                                                                                                                                                                                                                       
RO    18.54                                                                                                                                                                                                                                                     
RO    16.75                                                                                                                                                                                                                                                     
RO    15.75                                                                                                                                                                                                                                                     
RO    15.25                                                                                                                                                                                                                                                     
RO    14.00                                                                                                                                                                                                                                                     
RO    13.25                                                                                                                                                                                                                                                     
RO    13.00                                                                                                                                                                                                                                                     
RO    13.00                                                                                                                                                                                                                                                     
RO    12.50                                                                                                                                                                                                                                                     
RO    10.50                                                                                                                                                                                                                                                     
RO    10.00                                                                                                                                                                                                                                                     
RO    10.00                                                                                                                                                                                                                                                     
RO     9.95                                                                                                                                                                                                                                                     
RO     9.25                                                                                                                                                                                                                                                     
RO     9.25                                                                                                                                                                                                                                                     
RO     9.00                                                                                                                                                                                                                                                     
U     15.00                                                                                                                                                                                                                                                     
U     13.90                                                                                                                                                                                                                                                     
U     12.00                                                                                                                                                                                                                                                     
U     12.00                                                                                                                                                                                                                                                     
U     11.50                                                                                                                                                                                                                                                     
U     11.07                                                                                                                                                                                                                                                     
U     11.07                                                                                                                                                                                                                                                     
U     11.00                                                                                                                                                                                                                                                     
U     10.00                                                                                                                                                                                                                                                     
U     10.00                                                                                                                                                                                                                                                     
U      9.25                                                                                                                                                                                                                                                     
U      2.33                                                                                                                                                                                                                                                     
W     15.00                                                                                                                                                                                                                                                     
W     12.00                                                                                                                                                                                                                                                     
W      9.50                                                                                                                                                                                                                                                     
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
%let PCT = 20 ;                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                
data want (drop = commission) ;                                                                                                                                                                                                                                 
  do _n_ = 1 by 1 until (last.company) ;                                                                                                                                                                                                                        
    set have (where=(N(commission))) ;                                                                                                                                                                                                                          
    by company ;                                                                                                                                                                                                                                                
  end ;                                                                                                                                                                                                                                                         
  N = floor (_n_ * &PCT / 100) ;                                                                                                                                                                                                                                  
  do _n_ = 1 to _n_ ;                                                                                                                                                                                                                                           
    set have (where=(N(commission))) ;                                                                                                                                                                                                                          
    if _n_ <= N then Sum = sum (Sum, commission) ;                                                                                                                                                                                                              
  end ;                                                                                                                                                                                                                                                         
  Mean = divide (Sum, N) ;                                                                                                                                                                                                                                      
run ;                                                           

As @KachiM has pointed out, FLOOR can be replaced with another function the OP may find more suitable depending on more detailed specs.

 

Kind regards

Paul D. 

d0816
Quartz | Level 8

Hi @hashman, I did not understand the part where you said  "FLOOR can be replaced with another function the OP may find more suitable depending on more detailed specs".

 

Thank you for all the discussion going on in this thread. I am trying the code and figuring out how it is working.

hashman
Ammonite | Level 13

@d0816:

Meaning FLOOR or CEIL can be chosen depending on the business specs.

 

As an example, if you have PCT=20 and N=17 records in a group by COMPANY, the 20 percent is 3.4 records. If in the case of such a split the specs call for 3 records to be counted, you'd use FLOOR; if they call for 4 records, you use CEIL. Note that FLOOR will result in no records included at all if N*PCT/100 < 1 (e.g. if with PCT=20 you have N<5).     

 

Kind regards

Paul D.

Reeza
Super User

@d0816 wrote:

Hi @hashman, I did not understand the part where you said  "FLOOR can be replaced with another function the OP may find more suitable depending on more detailed specs".

 

Thank you for all the discussion going on in this thread. I am trying the code and figuring out how it is working.


If you're a beginner I would recommend using one of the simpler solutions you can understand and modify on your own.

Reeza
Super User

1. Use PROC RANK to group your data into the appropriate buckets. GROUPS=5 will create 5 groups that are each 20% of the data.

2. Use PROC MEANS to calculate the summary by company and calculated RANK.

 

proc sort data=company;
by company commission;
run;

proc rank data=have out=grouped groups=5;
by company;
var commission;
ranks commission_rank;
run;

proc means data=grouped noprint;
by company commission_rank;
var commission;
output out=want mean(commission) = avg_commission;
run;

This will provide the mean for all companies at all ranks.

You can easily filter this out to get just the top ranks though by adding another step. Then when you want to change your % you can change the GROUPS value. 

 

proc sql;
create table want_filtered as
select *
from want
group by company, commission_rank
having commission_rank = max(commission_rank);
quit;

@d0816 wrote:

Hello,

I have following sample data which I sorted by company and Commission (high to low) (original dataset is more than 3000 rows). For each of the company, I want to take top 20% commission data and calculate average of that top 20% only, (not all data for a company) with a flexibility to change the top n%. How can I do this?

Company Commission
I $17.75
I $15.00
I $15.00
I $13.00
I $12.50
I $10.00
I $10.00
I $9.89
I $9.15
I $9.00
I $9.00
I $8.75
I $8.50
I $8.00
I $7.25
F $15.50
F $14.00
F $14.00
F $13.00
F $13.00
F $12.50
F $12.00
F $12.00
F $12.00
F $11.50
F $11.50
F $11.50
F $11.00
F $10.50
F $10.45
F $10.00
F $10.00
F $10.00
F $9.00
F  
M $17.29
M $15.91
M $14.25
M $13.99
M $13.55
M $13.54
M $13.50
M $12.50
M $12.06
M $11.50
M $11.31
M $11.25
M $11.15
M $10.00
M $10.00
M $10.00
M $10.00
M $9.00
M $8.00
M $7.80
M $7.25
R $18.29
R $15.60
R $13.00
R $11.00
R $11.00
R $11.00
R $10.00
R $10.00
R $9.56
R $9.00
R $0.14
R  
RO $18.54
RO $16.75
RO $15.75
RO $15.25
RO $14.00
RO $13.25
RO $13.00
RO $13.00
RO $12.50
RO $10.50
RO $10.00
RO $10.00
RO $9.95
RO $9.25
RO $9.25
RO $9.00
U $15.00
U $13.90
U $12.00
U $12.00
U $11.50
U $11.07
U $11.07
U $11.00
U $10.00
U $10.00
U $9.25
U $2.33
W $15.00
W $12.00
W $9.50

 

For e.g. Company F has 20 rows, top 20% would be first 4 rows (because it is already sorted by commission for a company) and calculating average of those 4 commission data.

Company Commission
F $15.50
F $14.00
F $14.00
F $13.00

 

Result wanted:

Company Top n% Average Commission
I  
F $14.13
M  
R  
RO  
U  
W  

 

Appreciate any suggestion.


 

PaigeMiller
Diamond | Level 26

I like the answer by @Reeza best, better than my own answer. I like it best because it uses PROC RANK which automatically handles the missing values properly and correctly. I also like that it uses PROC MEANS, rather than data step code to do the calculations of the mean value. Especially for beginners, PROC MEANS is much easier (and it is a much more tested and therefore less risky) than any data step code to calculate the mean. That is one of the main benefits of SAS, they have done the hard work to create the algorithms, verify the algorithms and also provide the proper handling of missings. PROC RANK also offers a number of options on how to handle ties, which none of the other solutions do (not even my solution). 

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 12415 views
  • 6 likes
  • 6 in conversation