BookmarkSubscribeRSS Feed
zma4
Calcite | Level 5

Dear Community:

 

I have a table below (T1) and based on the table, I want to count the number of observations for each segment that meets the threshold condition: if Amount >= Threshold_Amount and Days >= Threshold.

 

Then we start reducing the Threshold_Amount by 100 each time until it equals to 100. We also reduce Threshold_Days by 1 each time until it equals to 10. We count for each threshold.

 

T1 is the table we have, T2 and T3 are the desired output. I am thinking using loop, however I do not know how to start. 

 

Any help are appreciated! Thank you

 

T1.jpeg

 

Desired Tables:

 

t2.jpeg

 

 

7 REPLIES 7
Astounding
PROC Star

Here's an approach:

 

data count_me;

set have;

if amount >= threshold_amount and days >= threshold_days then output;

do until (threshold_amount <= 100 and threshold_days <= 10);

   if threshold_amount > 100 then threshold_amount = threshold_amount - 100;

   if threshold_days > 10 then threshold_days = threshold_days - 1;

   if amount >= threshold_amount and days >= threshold_days then output;

end;

run;

 

proc freq data=count_me;

tables segment * threshold_amount * threshold_days / list;

run;

 

It's late, too late for me to even check the logic.  But you can see if it works for your purposes.

zma4
Calcite | Level 5

 Thank you for your reply. I found out that the output table does not include the count/frequency for Threshold_Amount = 500 /Threshold_Days = 20 for Segment A and Threshold_Amount = 500/Threshold_Days = 30 for Segment B. The actual counts/frequency for these Thresholds are 0s. Is there a way to modify the code to include those observations even the count/frequency is 0? Thank you

 

sastable.jpeg

Astounding
PROC Star

That's true, PROC FREQ only counts the observations that exist.  The simplest way to get additional levels into your data is to create them yourself.  But it requires a few steps.  First, capture the PROC FREQ output as a data set:

 

proc freq data=count_me;

tables segment * threshold_amount * threshold_days / out=subset noprint;

run;

 

Then create the levels you want to add:

 

data add_me;

count=0;

segment='A';

threshold_amount = 500;

threshold_days=20;

output;

segment='B';

threshold_amount = 500;

threshold_days = 30;

output;

run;

 

Then merge those extra two observations into the output from PROC FREQ:

 

data want;

merge add_me subset;

by segment threshold_amount threshold_days;

run;

 

Getting the final report at that point is simple:

 

proc print data=want;

var segment threshold_amount threshold_days count;

run;

zma4
Calcite | Level 5

Thank you, is there a way to deliver the output inside the data step (inside the loop), instead of creating and merging the datasets at the end? I am thinking if in case that there are more threshold levels (for example. starting at 800 and 50), there will be more levels that have count = 0, creating data and then merge them is not really efficient. Thank you.

Astounding
PROC Star

For many data-driven combinations, I would return to the original solution (but creating an output data set).

 

Then go back to your original data set, and use a loop to create all needed combinations, hard-coding the count at 0.

 

Then merge the two together so the counts of 0 fill in the missing categories.

 

I will probably have time to spell this out in more detail tonight.

Astounding
PROC Star

OK, here's the idea of how to create a shell, based on your original data set:

 

data shell;

set have;

by segment;

if first.segment;

retain count 0;

output;

do until (threshold_amount <= 100 and threshold_days <= 10);

   if threshold_amount > 100 then threshold_amount = threshold_amount - 100;

   if threshold_days > 10 then threshold_days = threshold_days - 1;

   output;

end;

run;

 

proc sort data=shell;

by segment threshold_amount threshold_days;

run;

 

This gives you an output data set with all zero counts.  It can be merged with the output data set from PROC FREQ:

 

data want;

merge shell subset;

by segment threshold_amount threshold_days;

run;

 

Whenever the combination of variables appears in both data sets, the COUNT from SUBSET will replace the 0 count within SHELL.

ballardw
Super User

@zma4 wrote:

Thank you, is there a way to deliver the output inside the data step (inside the loop), instead of creating and merging the datasets at the end? I am thinking if in case that there are more threshold levels (for example. starting at 800 and 50), there will be more levels that have count = 0, creating data and then merge them is not really efficient. Thank you.


When ever you request output that doesn't actually exist you run a risk of adding complexity.

One way might be a CLASSDATA data set that contains combinations of the variables you require and use proc means/summary

 

Maybe something like:

proc sql;
   create table datatypes as
   select distinct segment threshold_amount threshold_days
   from have
   ;
quit;


proc summary data=have classdata=datatypes nway;
   where amount ge threshold_amount and days ge threshold_days;
   class segment threshold_amount threshold_days;
   output out=summary (drop=_type_ rename=(_freq_=count));
run;

HOWEVER if you still want values of threshold_amount and threshold_days that never appeared in your data you'll still need another step add in values which we cannot identify because we 1) don't have your data and 2) don't know all of your concerns. If you have a complete set of rules as to what must exist in the output it is likely that a class data set could be built but we need details.

 

 

Also why split the result into two tables (implied by that picture)? By group processing with Segment will likely do what you want if you are doing essentially the same task for each segment value and with much less coding.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3157 views
  • 0 likes
  • 3 in conversation