BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wutao9999
Obsidian | Level 7

I have the below SAS data set, I want to get the rows with top volume_ratio that account for 90% of total volumes.  Because (0.278 + 0.246 +  0.193 + 0.190) > 90%, basically, I want to have a sas procedure that can get the top four rows of the below data.

product            customer_name   volume_ratio

apple               customer1            0.278

apple                customer2           0.246

apple                customer3           0.193

apple                customer4           0.190

apple                customer5           0.093

That is to say, the output should be

product            customer_name   volume_ratio

apple               customer1            0.278

apple                customer2           0.246

apple                customer3           0.193

apple                customer4           0.190

Is there a SAS procedure that can do this work?  Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

You could use PROC FREQ  (by product, weight volume_ratio) and then use ODS OUTPUT to write the result to a SAS dataset.

You then use a data step with by group processing *by product" and you don't write records from the input data set to output after the first record within a by group had a cumulative frequency of >=90 (you can use a lag() function to retrieve the value from the previous record).

View solution in original post

13 REPLIES 13
Patrick
Opal | Level 21

You could use PROC FREQ  (by product, weight volume_ratio) and then use ODS OUTPUT to write the result to a SAS dataset.

You then use a data step with by group processing *by product" and you don't write records from the input data set to output after the first record within a by group had a cumulative frequency of >=90 (you can use a lag() function to retrieve the value from the previous record).

SASKiwi
PROC Star

The UNIVARIATE procedure will do what you want. Check out the OUTPUT statement in this procedure.

wutao9999
Obsidian | Level 7

Could you give me an example?  I am a new player for SAS.  Appreciate your help.

Steelers_In_DC
Barite | Level 11

I think this is the output you are looking for:

data have;

infile cards;

input  product $ customer_name $ volume_ratio;

cards;             

apple customer1 0.278

apple customer2 0.246

apple customer3 0.193

apple customer4 0.190

apple customer5 0.093

;

run;

data want;

set have;

sum+volume_ratio;

if sum < round(.9) then output;

run;

wutao9999
Obsidian | Level 7

The data has different products, how to handle this?  Thank you.

Steelers_In_DC
Barite | Level 11

Try this with the same dataset, depending your resources one could be faster than the other:

data want;

do while(sum < .9);

set have;

sum+volume_ratio;

output;

end;

run;

Reeza
Super User

In general a BY statement, but it depends on your solution.

wutao9999
Obsidian | Level 7


I have created the below codes:

freq data = clean_model_data_temp4 order =  data;

by product;

weight vol_ratio;

run;

The code works well. How can I output all of the tables for different product into a single sas table?

Reeza
Super User

First example in the documentation:

Example 35.1 Output Data Set of Frequencies

SAS/STAT(R) 9.22 User's Guide

Since you probably want cumulative percentages as well look at the OUTCUM option

SAS/STAT(R) 9.22 User's Guide

Patrick
Opal | Level 21

Code like below could work:

data clean_model_data_temp4;

  infile cards;

  input  product $ customer_name :$10. vol_ratio;

  cards;

apple customer1 0.278

apple customer2 0.246

apple customer3 0.193

apple customer4 0.190

apple customer5 0.093

orange customer1 0.278

orange customer2 0.246

orange customer3 0.193

;

run;

ods _all_ close;

/*ods trace on;*/

ods output OneWayFreqs=Top90_Percent(keep=product customer_name CumPercent);

proc freq data = clean_model_data_temp4 order =  data;

  by product;

  weight vol_ratio;

  table customer_name;

run;

/*ods trace off;*/

data Top90_Percent(drop=_:);

  set Top90_Percent;

  by product;

  _lag_CumPercent=lag(CumPercent);

  if first.product then output;

  else if _lag_CumPercent<90 then output;

run;

ods listing;

Ksharp
Super User

data clean_model_data_temp4;
  infile cards;
  input  product $ customer_name :$10. vol_ratio;
  cards;
apple customer1 0.278
apple customer2 0.246
apple customer3 0.193
apple customer4 0.190
apple customer5 0.093
orange customer1 0.278
orange customer2 0.246
orange customer3 0.193
;
run;
data want;
 set clean_model_data_temp4;
 by product;
 retain found 0;
 if first.product then do;sum=0; found=0;end;
 sum+vol_ratio;
 if not found ;
 if sum gt 0.9 then found=1;
run;

Xia Keshan

wutao9999
Obsidian | Level 7

Appreciate all responses

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1421 views
  • 6 likes
  • 6 in conversation