I have the below SAS data set, I want to get the rows with top volume_ratio that account for 90% of total volumes. Because (0.278 + 0.246 + 0.193 + 0.190) > 90%, basically, I want to have a sas procedure that can get the top four rows of the below data.
product customer_name volume_ratio
apple customer1 0.278
apple customer2 0.246
apple customer3 0.193
apple customer4 0.190
apple customer5 0.093
That is to say, the output should be
product customer_name volume_ratio
apple customer1 0.278
apple customer2 0.246
apple customer3 0.193
apple customer4 0.190
Is there a SAS procedure that can do this work? Thank you.
You could use PROC FREQ (by product, weight volume_ratio) and then use ODS OUTPUT to write the result to a SAS dataset.
You then use a data step with by group processing *by product" and you don't write records from the input data set to output after the first record within a by group had a cumulative frequency of >=90 (you can use a lag() function to retrieve the value from the previous record).
You could use PROC FREQ (by product, weight volume_ratio) and then use ODS OUTPUT to write the result to a SAS dataset.
You then use a data step with by group processing *by product" and you don't write records from the input data set to output after the first record within a by group had a cumulative frequency of >=90 (you can use a lag() function to retrieve the value from the previous record).
The UNIVARIATE procedure will do what you want. Check out the OUTPUT statement in this procedure.
Could you give me an example? I am a new player for SAS. Appreciate your help.
I think this is the output you are looking for:
data have;
infile cards;
input product $ customer_name $ volume_ratio;
cards;
apple customer1 0.278
apple customer2 0.246
apple customer3 0.193
apple customer4 0.190
apple customer5 0.093
;
run;
data want;
set have;
sum+volume_ratio;
if sum < round(.9) then output;
run;
The data has different products, how to handle this? Thank you.
Try this with the same dataset, depending your resources one could be faster than the other:
data want;
do while(sum < .9);
set have;
sum+volume_ratio;
output;
end;
run;
Check this - go to the perentile example:
In general a BY statement, but it depends on your solution.
I have created the below codes:
freq data = clean_model_data_temp4 order = data;
by product;
weight vol_ratio;
run;
The code works well. How can I output all of the tables for different product into a single sas table?
First example in the documentation:
Since you probably want cumulative percentages as well look at the OUTCUM option
Code like below could work:
data clean_model_data_temp4;
infile cards;
input product $ customer_name :$10. vol_ratio;
cards;
apple customer1 0.278
apple customer2 0.246
apple customer3 0.193
apple customer4 0.190
apple customer5 0.093
orange customer1 0.278
orange customer2 0.246
orange customer3 0.193
;
run;
ods _all_ close;
/*ods trace on;*/
ods output OneWayFreqs=Top90_Percent(keep=product customer_name CumPercent);
proc freq data = clean_model_data_temp4 order = data;
by product;
weight vol_ratio;
table customer_name;
run;
/*ods trace off;*/
data Top90_Percent(drop=_:);
set Top90_Percent;
by product;
_lag_CumPercent=lag(CumPercent);
if first.product then output;
else if _lag_CumPercent<90 then output;
run;
ods listing;
data clean_model_data_temp4; infile cards; input product $ customer_name :$10. vol_ratio; cards; apple customer1 0.278 apple customer2 0.246 apple customer3 0.193 apple customer4 0.190 apple customer5 0.093 orange customer1 0.278 orange customer2 0.246 orange customer3 0.193 ; run; data want; set clean_model_data_temp4; by product; retain found 0; if first.product then do;sum=0; found=0;end; sum+vol_ratio; if not found ; if sum gt 0.9 then found=1; run;
Xia Keshan
Appreciate all responses
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.