How do I randomly distribute/split my dataset in 5 parts (5 suppliers) like 60%, 11.2% ,11 .2%, 11.2% and 6.4% with conditions?
I have a very manual code snippet but I have found some tips on SURVEYSELECT but I’m open for any suggestions/help.
More info below.
Thanks,
Paula
The dataset contains 2806 customers/rows in dataset
The list should be distributed accordingly:
1) 60 % to Smorgasbord
2) 11.2 % to GPS
3) 11.2 % to Saab
4) 11.2 % to ABBA
5) 6,4 % to Pacemaker
Conditions
This is the tricky bit (at least for me):
1) If the number of employees is greater than 9 the customers must be distributed to Smorgasbord.
2) The distribution between old/new should be the same for the supplier.
3) Besides 1 )and 2): It should be random which customer is given to a certain supplier.
E.g.
Smorgasbord will have 1028 new customers and 95 old customers AND will have 100 % of customers with a number of employees being greater than 9 (59+34).
Pacemaker will have 165 old customer and 15 new customer AND will only have customers with a number of employees between 0-9.
Dummy code
DATA WORK.temp;
Infile datalines delimiter=',';
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ;
CARDS;
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Remember that Smorgasbord should have 60% */
IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
Else Split = 'Not assigned yet';
RUN;
So you data looks like this, correct? Just to get started..
data have;
do customer = 1 to 2806;
if customer <= 2570 then do;
oldnew = "Old";
if customer <= 2511 then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
else do;
oldnew = "New";
if customer <= (2570 + 202) then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
end;
run;
*correction 60 % to Smorgasbord
Ah. beat me to it.
So you data looks like this, correct? Just to get started..
data have;
do customer = 1 to 2806;
if customer <= 2570 then do;
oldnew = "Old";
if customer <= 2511 then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
else do;
oldnew = "New";
if customer <= (2570 + 202) then do;
emp = "0-9 Employees ";
output;
end;
else do;
emp = "10-19 Employees";
output;
end;
end;
end;
run;
Here is an example with 25 rows (i know a tiny dataset, but the logic should be the same no matter the volume?)
Condition
1) The customers with >9 empl. must be assigned to Smorgasbord -> 8 of them ( 3 New & 5 Old )
2) Smorgasbord should have 15 of 25 -> 6 of the new customers and 9 of the old (60 % in total and within New/Old)
Questions
3) How do I randomly assign 3 New and 4 Old more to Smorgasbord so as to add up to 60 %?
4) How to I assign the rest of the accordingly to the supplier's share and old/new?
DATA WORK.temp;
Infile datalines delimiter=',';
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ;
CARDS;
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Smorgasbord should have 60% = 15 */
IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
Else Split = 'Not assigned yet';
/*15-8 = 7 more must be given to Smorgasbord*/
RUN;
0.40 + 0.112 + 0.112 + 0.112 + 0.064 = 0.8.
What do you want to do with the remaining 20% of your data?
Did you find your answer? 🙂 If so, great. If not, I'll be happy to help you.
Hi! 🙂 I will have look at all the replies and see what i find 🙂
Thanks so much for giving me a helping hand!
data part1 part2 part3 part4 part5;
set sashelp.heart;
call streaminit(12345678);
_n_=rand('table',.6, .112 ,.112, .112 ,.064);
select(_n_);
when(1) output part1;
when(2) output part2;
when(3) output part3;
when(4) output part4;
when(5) output part5;
otherwise;
end;
run;
Thats a very good start 🙂
BUt how to I assign accordingly to my conditions as well?
I actually found one of your other suggestions in a similar thread. 😁
Basically - with help from a colleague - we did create macro-variable for the different shares. Less manual work from me.
And then I just put the macro variables in the code like:
This is just for Old.
DATA tempOld tempNew Given_to_SM;
SET WORK.temp2;
IF (Split NE 'Smorgasbord' AND Type_of_customer = 'Old') THEN OUTPUT tempOld;
ELSE IF (Split NE 'Smorgasbord' AND Type_of_customer = 'New') THEN OUTPUT tempNew;
ELSE OUTPUT Given_to_SM;
RUN;
/*Random Old*/
DATA SmorgasbordOld SaabOld ABBAOld GPSOld PacemakerOld;
SET tempOld nobs=n_avail;
CALL streaminit(123456780);
IF _n_=1 THEN DO;
ARRAY need {1:5} _temporary_;
need{1} = round(&ShareOld_Smorgasbord*n_avail);
need{2} = round(&ShareOld_SAAB*n_avail);
need{3} = round(&ShareOld_ABBA*n_avail);
need{4} = round(&ShareOld_GPS*n_avail);
need{5} = n_avail-sum(of need{*});
END;
n=rand('table',need{1}/n_avail,need{2}/n_avail,need{3}/n_avail,need{4}/n_avail);
IF n=1 THEN OUTPUT SmorgasbordOld;
ELSE IF n=2 THEN OUTPUT SaabOld;
ELSE IF n=3 THEN OUTPUT ABBAOld;
ELSE IF n=4 THEN OUTPUT GPSOld;
ELSE OUTPUT PacemakerOld;
need{n}=need{n}-1;
n_avail+(-1);
DROP n;
RUN;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.