Help using Base SAS procedures

How to get top values that account for 90% of total

Accepted Solution Solved
Reply
Contributor
Posts: 52
Accepted Solution

How to get top values that account for 90% of total

I have the below SAS data set, I want to get the rows with top volume_ratio that account for 90% of total volumes.  Because (0.278 + 0.246 +  0.193 + 0.190) > 90%, basically, I want to have a sas procedure that can get the top four rows of the below data.

product            customer_name   volume_ratio

apple               customer1            0.278

apple                customer2           0.246

apple                customer3           0.193

apple                customer4           0.190

apple                customer5           0.093

That is to say, the output should be

product            customer_name   volume_ratio

apple               customer1            0.278

apple                customer2           0.246

apple                customer3           0.193

apple                customer4           0.190

Is there a SAS procedure that can do this work?  Thank you.


Accepted Solutions
Solution
‎03-31-2015 05:11 PM
Respected Advisor
Posts: 4,173

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

You could use PROC FREQ  (by product, weight volume_ratio) and then use ODS OUTPUT to write the result to a SAS dataset.

You then use a data step with by group processing *by product" and you don't write records from the input data set to output after the first record within a by group had a cumulative frequency of >=90 (you can use a lag() function to retrieve the value from the previous record).

View solution in original post


All Replies
Solution
‎03-31-2015 05:11 PM
Respected Advisor
Posts: 4,173

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

You could use PROC FREQ  (by product, weight volume_ratio) and then use ODS OUTPUT to write the result to a SAS dataset.

You then use a data step with by group processing *by product" and you don't write records from the input data set to output after the first record within a by group had a cumulative frequency of >=90 (you can use a lag() function to retrieve the value from the previous record).

Super User
Posts: 3,252

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

The UNIVARIATE procedure will do what you want. Check out the OUTPUT statement in this procedure.

Contributor
Posts: 52

Re: How to get top values that account for 90% of total

Could you give me an example?  I am a new player for SAS.  Appreciate your help.

Valued Guide
Posts: 860

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

I think this is the output you are looking for:

data have;

infile cards;

input  product $ customer_name $ volume_ratio;

cards;             

apple customer1 0.278

apple customer2 0.246

apple customer3 0.193

apple customer4 0.190

apple customer5 0.093

;

run;

data want;

set have;

sum+volume_ratio;

if sum < round(.9) then output;

run;

Contributor
Posts: 52

Re: How to get top values that account for 90% of total

Posted in reply to Steelers_In_DC

The data has different products, how to handle this?  Thank you.

Valued Guide
Posts: 860

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

Try this with the same dataset, depending your resources one could be faster than the other:

data want;

do while(sum < .9);

set have;

sum+volume_ratio;

output;

end;

run;

Super User
Posts: 3,252

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999
Super User
Posts: 19,772

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

In general a BY statement, but it depends on your solution.

Contributor
Posts: 52

Re: How to get top values that account for 90% of total


I have created the below codes:

freq data = clean_model_data_temp4 order =  data;

by product;

weight vol_ratio;

run;

The code works well. How can I output all of the tables for different product into a single sas table?

Super User
Posts: 19,772

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

First example in the documentation:

Example 35.1 Output Data Set of Frequencies

SAS/STAT(R) 9.22 User's Guide

Since you probably want cumulative percentages as well look at the OUTCUM option

SAS/STAT(R) 9.22 User's Guide

Respected Advisor
Posts: 4,173

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

Code like below could work:

data clean_model_data_temp4;

  infile cards;

  input  product $ customer_name :$10. vol_ratio;

  cards;

apple customer1 0.278

apple customer2 0.246

apple customer3 0.193

apple customer4 0.190

apple customer5 0.093

orange customer1 0.278

orange customer2 0.246

orange customer3 0.193

;

run;

ods _all_ close;

/*ods trace on;*/

ods output OneWayFreqs=Top90_Percent(keep=product customer_name CumPercent);

proc freq data = clean_model_data_temp4 order =  data;

  by product;

  weight vol_ratio;

  table customer_name;

run;

/*ods trace off;*/

data Top90_Percent(drop=_Smiley Happy;

  set Top90_Percent;

  by product;

  _lag_CumPercent=lag(CumPercent);

  if first.product then output;

  else if _lag_CumPercent<90 then output;

run;

ods listing;

Super User
Posts: 10,020

Re: How to get top values that account for 90% of total

Posted in reply to wutao9999

data clean_model_data_temp4;
  infile cards;
  input  product $ customer_name :$10. vol_ratio;
  cards;
apple customer1 0.278
apple customer2 0.246
apple customer3 0.193
apple customer4 0.190
apple customer5 0.093
orange customer1 0.278
orange customer2 0.246
orange customer3 0.193
;
run;
data want;
 set clean_model_data_temp4;
 by product;
 retain found 0;
 if first.product then do;sum=0; found=0;end;
 sum+vol_ratio;
 if not found ;
 if sum gt 0.9 then found=1;
run;

Xia Keshan

Contributor
Posts: 52

Re: How to get top values that account for 90% of total

Appreciate all responses

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 395 views
  • 6 likes
  • 6 in conversation