Dear Community:
I have a table below (T1) and based on the table, I want to count the number of observations for each segment that meets the threshold condition: if Amount >= Threshold_Amount and Days >= Threshold.
Then we start reducing the Threshold_Amount by 100 each time until it equals to 100. We also reduce Threshold_Days by 1 each time until it equals to 10. We count for each threshold.
T1 is the table we have, T2 and T3 are the desired output. I am thinking using loop, however I do not know how to start.
Any help are appreciated! Thank you
Desired Tables:
Here's an approach:
data count_me;
set have;
if amount >= threshold_amount and days >= threshold_days then output;
do until (threshold_amount <= 100 and threshold_days <= 10);
if threshold_amount > 100 then threshold_amount = threshold_amount - 100;
if threshold_days > 10 then threshold_days = threshold_days - 1;
if amount >= threshold_amount and days >= threshold_days then output;
end;
run;
proc freq data=count_me;
tables segment * threshold_amount * threshold_days / list;
run;
It's late, too late for me to even check the logic. But you can see if it works for your purposes.
Thank you for your reply. I found out that the output table does not include the count/frequency for Threshold_Amount = 500 /Threshold_Days = 20 for Segment A and Threshold_Amount = 500/Threshold_Days = 30 for Segment B. The actual counts/frequency for these Thresholds are 0s. Is there a way to modify the code to include those observations even the count/frequency is 0? Thank you
That's true, PROC FREQ only counts the observations that exist. The simplest way to get additional levels into your data is to create them yourself. But it requires a few steps. First, capture the PROC FREQ output as a data set:
proc freq data=count_me;
tables segment * threshold_amount * threshold_days / out=subset noprint;
run;
Then create the levels you want to add:
data add_me;
count=0;
segment='A';
threshold_amount = 500;
threshold_days=20;
output;
segment='B';
threshold_amount = 500;
threshold_days = 30;
output;
run;
Then merge those extra two observations into the output from PROC FREQ:
data want;
merge add_me subset;
by segment threshold_amount threshold_days;
run;
Getting the final report at that point is simple:
proc print data=want;
var segment threshold_amount threshold_days count;
run;
Thank you, is there a way to deliver the output inside the data step (inside the loop), instead of creating and merging the datasets at the end? I am thinking if in case that there are more threshold levels (for example. starting at 800 and 50), there will be more levels that have count = 0, creating data and then merge them is not really efficient. Thank you.
For many data-driven combinations, I would return to the original solution (but creating an output data set).
Then go back to your original data set, and use a loop to create all needed combinations, hard-coding the count at 0.
Then merge the two together so the counts of 0 fill in the missing categories.
I will probably have time to spell this out in more detail tonight.
OK, here's the idea of how to create a shell, based on your original data set:
data shell;
set have;
by segment;
if first.segment;
retain count 0;
output;
do until (threshold_amount <= 100 and threshold_days <= 10);
if threshold_amount > 100 then threshold_amount = threshold_amount - 100;
if threshold_days > 10 then threshold_days = threshold_days - 1;
output;
end;
run;
proc sort data=shell;
by segment threshold_amount threshold_days;
run;
This gives you an output data set with all zero counts. It can be merged with the output data set from PROC FREQ:
data want;
merge shell subset;
by segment threshold_amount threshold_days;
run;
Whenever the combination of variables appears in both data sets, the COUNT from SUBSET will replace the 0 count within SHELL.
@zma4 wrote:
Thank you, is there a way to deliver the output inside the data step (inside the loop), instead of creating and merging the datasets at the end? I am thinking if in case that there are more threshold levels (for example. starting at 800 and 50), there will be more levels that have count = 0, creating data and then merge them is not really efficient. Thank you.
When ever you request output that doesn't actually exist you run a risk of adding complexity.
One way might be a CLASSDATA data set that contains combinations of the variables you require and use proc means/summary
Maybe something like:
proc sql; create table datatypes as select distinct segment threshold_amount threshold_days from have ; quit; proc summary data=have classdata=datatypes nway; where amount ge threshold_amount and days ge threshold_days; class segment threshold_amount threshold_days; output out=summary (drop=_type_ rename=(_freq_=count)); run;
HOWEVER if you still want values of threshold_amount and threshold_days that never appeared in your data you'll still need another step add in values which we cannot identify because we 1) don't have your data and 2) don't know all of your concerns. If you have a complete set of rules as to what must exist in the output it is likely that a class data set could be built but we need details.
Also why split the result into two tables (implied by that picture)? By group processing with Segment will likely do what you want if you are doing essentially the same task for each segment value and with much less coding.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.