How do I randomly distribute/split my dataset in 5 parts (5 suppliers) like 60%, 11.2% ,11 .2%, 11.2% and 6.4% with conditions?
I have a very manual code snippet but I have found some tips on SURVEYSELECT but I’m open for any suggestions/help.
More info below.
Thanks,
Paula
The dataset contains 2806 customers/rows in dataset
The list should be distributed accordingly:
1) 60 % to Smorgasbord
2) 11.2 % to GPS
3) 11.2 % to Saab
4) 11.2 % to ABBA
5) 6,4 % to Pacemaker
Conditions
This is the tricky bit (at least for me):
1) If the number of employees is greater than 9 the customers must be distributed to Smorgasbord.
2) The distribution between old/new should be the same for the supplier.
3) Besides 1 )and 2): It should be random which customer is given to a certain supplier.
E.g.
Smorgasbord will have 1028 new customers and 95 old customers AND will have 100 % of customers with a number of employees being greater than 9 (59+34).
Pacemaker will have 165 old customer and 15 new customer AND will only have customers with a number of employees between 0-9.
Dummy code
DATA WORK.temp;
Infile datalines delimiter=',';
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ;
CARDS;
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Remember that Smorgasbord should have 60% */
IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
Else Split = 'Not assigned yet';
RUN;
So you data looks like this, correct? Just to get started..
data have;
do customer = 1 to 2806;
if customer <= 2570 then do;
oldnew = "Old";
if customer <= 2511 then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
else do;
oldnew = "New";
if customer <= (2570 + 202) then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
end;
run;
*correction 60 % to Smorgasbord
Ah. beat me to it.
So you data looks like this, correct? Just to get started..
data have;
do customer = 1 to 2806;
if customer <= 2570 then do;
oldnew = "Old";
if customer <= 2511 then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
else do;
oldnew = "New";
if customer <= (2570 + 202) then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
end;
run;
Here is an example with 25 rows (i know a tiny dataset, but the logic should be the same no matter the volume?)
Condition
1) The customers with >9 empl. must be assigned to Smorgasbord -> 8 of them ( 3 New & 5 Old )
2) Smorgasbord should have 15 of 25 -> 6 of the new customers and 9 of the old (60 % in total and within New/Old)
Questions
3) How do I randomly assign 3 New and 4 Old more to Smorgasbord so as to add up to 60 %?
4) How to I assign the rest of the accordingly to the supplier's share and old/new?
DATA WORK.temp;
Infile datalines delimiter=',';
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ;
CARDS;
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Smorgasbord should have 60% = 15 */
IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
Else Split = 'Not assigned yet';
/*15-8 = 7 more must be given to Smorgasbord*/
RUN;
0.40 + 0.112 + 0.112 + 0.112 + 0.064 = 0.8.
What do you want to do with the remaining 20% of your data?
Did you find your answer? 🙂 If so, great. If not, I'll be happy to help you.
Hi! 🙂 I will have look at all the replies and see what i find 🙂
Thanks so much for giving me a helping hand!
data part1 part2 part3 part4 part5;
set sashelp.heart;
call streaminit(12345678);
_n_=rand('table',.6, .112 ,.112, .112 ,.064);
select(_n_);
when(1) output part1;
when(2) output part2;
when(3) output part3;
when(4) output part4;
when(5) output part5;
otherwise;
end;
run;
Thats a very good start 🙂
BUt how to I assign accordingly to my conditions as well?
I actually found one of your other suggestions in a similar thread. 😁
Basically - with help from a colleague - we did create macro-variable for the different shares. Less manual work from me.
And then I just put the macro variables in the code like:
This is just for Old.
DATA tempOld tempNew Given_to_SM;
SET WORK.temp2;
IF (Split NE 'Smorgasbord' AND Type_of_customer = 'Old') THEN OUTPUT tempOld;
ELSE IF (Split NE 'Smorgasbord' AND Type_of_customer = 'New') THEN OUTPUT tempNew;
ELSE OUTPUT Given_to_SM;
RUN;
/*Random Old*/
DATA SmorgasbordOld SaabOld ABBAOld GPSOld PacemakerOld;
SET tempOld nobs=n_avail;
CALL streaminit(123456780);
IF _n_=1 THEN DO;
ARRAY need {1:5} _temporary_;
need{1} = round(&ShareOld_Smorgasbord*n_avail);
need{2} = round(&ShareOld_SAAB*n_avail);
need{3} = round(&ShareOld_ABBA*n_avail);
need{4} = round(&ShareOld_GPS*n_avail);
need{5} = n_avail-sum(of need{*});
END;
n=rand('table',need{1}/n_avail,need{2}/n_avail,need{3}/n_avail,need{4}/n_avail);
IF n=1 THEN OUTPUT SmorgasbordOld;
ELSE IF n=2 THEN OUTPUT SaabOld;
ELSE IF n=3 THEN OUTPUT ABBAOld;
ELSE IF n=4 THEN OUTPUT GPSOld;
ELSE OUTPUT PacemakerOld;
need{n}=need{n}-1;
n_avail+(-1);
DROP n;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.