BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pili1100
Obsidian | Level 7

How do I randomly distribute/split my dataset in 5 parts (5 suppliers) like 60%, 11.2% ,11 .2%, 11.2% and 6.4% with conditions?

 

I have a very manual code snippet but I have found some tips on SURVEYSELECT but I’m open for any suggestions/help.

 

More info below.

Thanks,

Paula

 

The dataset contains 2806 customers/rows in dataset

  • 2570 old customers (2511 has 0-9 employees/59 has 10-19 employees)
  • 236 new customers (202 has 0-9 employees/34 has 10-19 employees)

The list should be distributed accordingly:

1) 60 % to Smorgasbord

2) 11.2 % to GPS

3) 11.2 % to Saab

4) 11.2 % to ABBA

5) 6,4 % to Pacemaker

 

Conditions

This is the tricky bit (at least for me):

1) If the number of employees is greater than 9 the customers must be distributed to Smorgasbord.

2) The distribution between old/new should be the same for the supplier.

3) Besides 1 )and 2): It should be random which customer is given to a certain supplier.

E.g.  

Smorgasbord will have 1028 new customers and 95 old customers AND will have 100 % of customers with a number of employees being greater than 9 (59+34).

Pacemaker will have 165 old customer and 15 new customer AND will only have customers with a number of employees between 0-9.

Dummy code

DATA WORK.temp; 
Infile datalines delimiter=','; 
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ; 

CARDS; 
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Remember that Smorgasbord should have 60%  */
	IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
	Else Split = 'Not assigned yet';
RUN;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

So you data looks like this, correct? Just to get started..

 

data have;
   do customer = 1 to 2806;
      if customer <= 2570 then do;
          oldnew = "Old";
          if customer <= 2511 then do;
             emp = "0-9 Employees ";
             output;
          end;
          else do;
             emp = "10-19 Employees";
             output;
          end;
      end;
      else do;
         oldnew = "New";
         if customer <= (2570 + 202) then do;
            emp = "0-9 Employees ";
            output;
         end;
         else do;
            emp = "10-19 Employees";
            output;
         end;
      end;
   end;
run;

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

So you data looks like this, correct? Just to get started..

 

data have;
   do customer = 1 to 2806;
      if customer <= 2570 then do;
          oldnew = "Old";
          if customer <= 2511 then do;
             emp = "0-9 Employees ";
             output;
          end;
          else do;
             emp = "10-19 Employees";
             output;
          end;
      end;
      else do;
         oldnew = "New";
         if customer <= (2570 + 202) then do;
            emp = "0-9 Employees ";
            output;
         end;
         else do;
            emp = "10-19 Employees";
            output;
         end;
      end;
   end;
run;
Pili1100
Obsidian | Level 7

 

Here is an example with 25 rows (i know a tiny dataset, but the logic should be the same no matter the volume?)

Skärmbild 2021-01-14 135702.jpgSkärmbild 2021-01-14 135828.jpgSkärmbild 2021-01-14 135954.jpg

Condition

1) The customers with >9 empl. must be assigned to Smorgasbord -> 8 of them ( 3 New5 Old )

2) Smorgasbord should have 15 of 25 -> 6 of the new customers and 9 of the old (60 % in total and within New/Old)

Questions

3) How do I randomly assign 3 New and 4 Old more to Smorgasbord so as to add up to 60 %?

4) How to I assign the rest of the accordingly to the supplier's share and old/new? 

DATA WORK.temp; 
Infile datalines delimiter=','; 
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ; 

CARDS; 
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Smorgasbord should have 60% = 15  */
	IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
	Else Split = 'Not assigned yet';
	/*15-8 = 7 more must be given to Smorgasbord*/
RUN;
PeterClemmensen
Tourmaline | Level 20

0.40 + 0.112 + 0.112 + 0.112 + 0.064 = 0.8.

 

What do you want to do with the remaining 20% of your data? 

PeterClemmensen
Tourmaline | Level 20

Did you find your answer? 🙂 If so, great. If not, I'll be happy to help you.

Pili1100
Obsidian | Level 7

Hi! 🙂 I will have look at all the replies and see what i find 🙂

Thanks so much for giving me a helping hand! 

Ksharp
Super User
data part1 part2 part3 part4 part5;
 set sashelp.heart;
 call streaminit(12345678);
 _n_=rand('table',.6, .112 ,.112, .112 ,.064);
 select(_n_);
 when(1) output part1;
 when(2) output part2;
 when(3) output part3;
 when(4) output part4;
 when(5) output part5;
 otherwise;
 end;
 
run;
Pili1100
Obsidian | Level 7

Thats a very good start 🙂

BUt how to I assign accordingly to my conditions as well?

Ksharp
Super User
From your 2806 obs get rid of
59 has 10-19 employees)
34 has 10-19 employees)

and using 2806 - 59 - 34 obs to my code and
also correct
60%, 11.2% ,11 .2%, 11.2% and 6.4%
into right probability ,since 2806 has been changed .
Pili1100
Obsidian | Level 7

I actually found one of your other suggestions in a similar thread. 😁

Basically - with help from a colleague - we did create macro-variable for the different shares. Less manual work from me.

 

And then I just put the macro variables in the code like:

This is just for Old. 

 

DATA tempOld tempNew Given_to_SM;
SET WORK.temp2;
	IF (Split NE 'Smorgasbord' AND Type_of_customer = 'Old') THEN OUTPUT  tempOld;	
	ELSE IF (Split NE 'Smorgasbord' AND Type_of_customer = 'New') THEN OUTPUT  tempNew;
		ELSE OUTPUT Given_to_SM;
RUN;

/*Random Old*/
DATA SmorgasbordOld SaabOld ABBAOld GPSOld PacemakerOld;
 SET tempOld  nobs=n_avail;
 CALL streaminit(123456780);
 IF _n_=1 THEN DO;
   ARRAY need {1:5} _temporary_;
   need{1} = round(&ShareOld_Smorgasbord*n_avail);
   need{2} = round(&ShareOld_SAAB*n_avail);
   need{3} = round(&ShareOld_ABBA*n_avail);
   need{4} = round(&ShareOld_GPS*n_avail);
   need{5} = n_avail-sum(of need{*});
 END;
 n=rand('table',need{1}/n_avail,need{2}/n_avail,need{3}/n_avail,need{4}/n_avail);
 IF n=1 THEN  OUTPUT SmorgasbordOld; 
  ELSE IF n=2 THEN  OUTPUT SaabOld; 
   ELSE IF n=3 THEN   OUTPUT ABBAOld;
   ELSE IF n=4 THEN   OUTPUT GPSOld;
    ELSE OUTPUT PacemakerOld;
 need{n}=need{n}-1;
 n_avail+(-1);
DROP n;
RUN;

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3448 views
  • 1 like
  • 3 in conversation