Hi,
We are working on a dialing list to call customers. We have categorised accounts into six segments based on some scoring: New, very high, high, medium, low, and very low.
We would then want to split them into 4 groups (Atleast, Ideally, If Possible, Remaining) based on the percentage allocation shown in the below screenshot, and cumulatively include accounts in this priority order until we reach the daily dial capacity (which changes every day).
Have attached the sample data below.
e.g. if we have 500 Accounts (10 New, 100 Very High, 200 high, 100 Med, 90 Low) and the daily dial capacity is 100, this is how the allocation should go:
AtLeast : 5 New, 50 Very High, 40 High, 5 Med
Ideally : None
If possible : None
Remaining : None
proc import datafile="C:\Users\xiakeshan\Documents\Downloads\Categorisation.csv" out=have dbms=csv replace;
guessingrows=max;
run;
data High Low Medium New VeryHigh VeryLow;
set have;
select(Category);
when('High') output High;
when('Low') output Low;
when('Medium') output Medium;
when('New') output New;
when('Very High') output VeryHigh;
when('Very Low') output VeryLow;
otherwise;
end;
run;
proc surveyselect data=New out=New1 groups=2 seed=123 noprint;
run;
proc surveyselect data=VeryHigh out=VeryHigh1 groups=2 seed=123 noprint;
run;
proc surveyselect data=High out=High1 groups=10 seed=123 noprint;
run;
proc surveyselect data=Medium out=Medium1 groups=10 seed=123 noprint;
run;
proc surveyselect data=Low out=Low1 groups=1000 seed=123 noprint;
run;
proc surveyselect data=VeryLow out=VeryLow1 groups=1000 seed=123 noprint;
run;
%macro sample(capacity=);
data at_least;
set
new1(where=(groupid=1))
VeryHigh1(where=(groupid=1))
High1(where=(groupid in (1:2)))
Medium1(where=(groupid=1))
Low1(where=(groupid in (1:16)))
VeryLow1(where=(groupid in (1:8)))
end=last;
if last then call symputx('nobs1',_n_);
run;
%if &nobs1. < &capacity. %then %do;
data ideally;
set
new1(where=(groupid=2))
VeryHigh1(where=(groupid=2))
High1(where=(groupid in (3:5)))
Medium1(where=(groupid=2))
Low1(where=(groupid in (17:47)))
VeryLow1(where=(groupid in (9:16)))
end=last;
if last then call symputx('nobs2',_n_);
run;
%if %eval(&nobs1.+&nobs2.) < &capacity. %then %do;
data if_possible;
set
High1(where=(groupid in (6:10)))
Medium1(where=(groupid in (3:5)))
Low1(where=(groupid in (48:101)))
VeryLow1(where=(groupid in (17:47)))
end=last;
if last then call symputx('nobs3',_n_);
run;
%if %eval(&nobs1.+&nobs2.+&nobs3.) < &capacity. %then %do;
data remain;
set
High1(where=(groupid>10))
Medium1(where=(groupid>5))
Low1(where=(groupid>101))
VeryLow1(where=(groupid>47))
end=last;
if last then call symputx('nobs4',_n_);
run;
%if %eval(&nobs1.+&nobs2.+&nobs3..+&nobs4.) > &capacity. %then %do;
data remain;
set remain;
if _n_ <= %eval(&capacity.-&nobs1.-&nobs2.-&nobs3.) ;
run;
%end;
%end;
%else %do;
data if_possible;
set if_possible;
if _n_ <= %eval(&capacity.-&nobs1.-&nobs2.) ;
run;
%end;
%end;
%else %do;
data ideally;
set ideally;
if _n_ <= %eval(&capacity.-&nobs1.) ;
run;
%end;
%end;
%else %do;
data at_least;
set at_least;
if _n_ <= &capacity. ;
run;
%end;
%mend;
*the daily dial capacity;
%sample(capacity=1000)
What have you tried?
Please post data in usable form: a data step with datalines/cards. Reduce the number of observation, so that every case is present.
I would start with proc freq to calculate the percentages.
proc import datafile="C:\Users\xiakeshan\Documents\Downloads\Categorisation.csv" out=have dbms=csv replace;
guessingrows=max;
run;
data High Low Medium New VeryHigh VeryLow;
set have;
select(Category);
when('High') output High;
when('Low') output Low;
when('Medium') output Medium;
when('New') output New;
when('Very High') output VeryHigh;
when('Very Low') output VeryLow;
otherwise;
end;
run;
proc surveyselect data=New out=New1 groups=2 seed=123 noprint;
run;
proc surveyselect data=VeryHigh out=VeryHigh1 groups=2 seed=123 noprint;
run;
proc surveyselect data=High out=High1 groups=10 seed=123 noprint;
run;
proc surveyselect data=Medium out=Medium1 groups=10 seed=123 noprint;
run;
proc surveyselect data=Low out=Low1 groups=1000 seed=123 noprint;
run;
proc surveyselect data=VeryLow out=VeryLow1 groups=1000 seed=123 noprint;
run;
%macro sample(capacity=);
data at_least;
set
new1(where=(groupid=1))
VeryHigh1(where=(groupid=1))
High1(where=(groupid in (1:2)))
Medium1(where=(groupid=1))
Low1(where=(groupid in (1:16)))
VeryLow1(where=(groupid in (1:8)))
end=last;
if last then call symputx('nobs1',_n_);
run;
%if &nobs1. < &capacity. %then %do;
data ideally;
set
new1(where=(groupid=2))
VeryHigh1(where=(groupid=2))
High1(where=(groupid in (3:5)))
Medium1(where=(groupid=2))
Low1(where=(groupid in (17:47)))
VeryLow1(where=(groupid in (9:16)))
end=last;
if last then call symputx('nobs2',_n_);
run;
%if %eval(&nobs1.+&nobs2.) < &capacity. %then %do;
data if_possible;
set
High1(where=(groupid in (6:10)))
Medium1(where=(groupid in (3:5)))
Low1(where=(groupid in (48:101)))
VeryLow1(where=(groupid in (17:47)))
end=last;
if last then call symputx('nobs3',_n_);
run;
%if %eval(&nobs1.+&nobs2.+&nobs3.) < &capacity. %then %do;
data remain;
set
High1(where=(groupid>10))
Medium1(where=(groupid>5))
Low1(where=(groupid>101))
VeryLow1(where=(groupid>47))
end=last;
if last then call symputx('nobs4',_n_);
run;
%if %eval(&nobs1.+&nobs2.+&nobs3..+&nobs4.) > &capacity. %then %do;
data remain;
set remain;
if _n_ <= %eval(&capacity.-&nobs1.-&nobs2.-&nobs3.) ;
run;
%end;
%end;
%else %do;
data if_possible;
set if_possible;
if _n_ <= %eval(&capacity.-&nobs1.-&nobs2.) ;
run;
%end;
%end;
%else %do;
data ideally;
set ideally;
if _n_ <= %eval(&capacity.-&nobs1.) ;
run;
%end;
%end;
%else %do;
data at_least;
set at_least;
if _n_ <= &capacity. ;
run;
%end;
%mend;
*the daily dial capacity;
%sample(capacity=1000)
Here's another way to do it -- the first part just creates some data to match your example. In this code, "stage" is from 1-4 and corresponds to "at least", "ideally", "if possible", "remaining". Priority is from 1-6 and corresponds to "New", "Very high", "High", "Medium", "Low", "Very low". CNUM is the customer ID.
** make some test data ;
data test (drop=n);
array P {6} _temporary_ (10, 100, 200, 100, 90, 0);
retain cnum 0;
do priority=1 to dim(P);
n=0;
do while (n<P[priority]);
n+1;
cnum+1;
output;
end;
end;
run;
%let custdata=TEST;
%let call_capacity=100;
* get total record count ;
data _null_;
if 0 then set &custdata nobs=nobs;
call symputx("nobs",nobs);
stop;
run;
* create call list ;
data call_list (keep=cnum stage pr);
set &custdata end=last;
array S {6,&nobs} _temporary_;
array N {6} _temporary_;
retain N 0;
N[priority]+1;
S[priority,N[priority]]=cnum;
if last then do;
recs=0;
array D {6} _temporary_ (6*0);
array P {6,4} _temporary_ (
0.5,1,1,1,
0.5,1,1,1,
0.2,0.5,1,1,
0.1,0.2,0.5,1,
0.016,0.047,0.1,1,
0.008,0.016,0.047,1
);
do stage=1 to dim(P,2);
do pr=1 to dim(P,1);
do while(D[pr]<N[pr] and D[pr]/N[pr]<P[pr,stage]);
D[pr]+1;
recs+1;
cnum=S[pr,D[pr]];
output;
if recs=&call_capacity then stop;
end;
end;
end;
end;
run;
** check results ;
proc freq data=call_list; table pr/missing; run;
Result:
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.