BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pili1100
Obsidian | Level 7

How do I randomly distribute/split my dataset in 5 parts (5 suppliers) like 60%, 11.2% ,11 .2%, 11.2% and 6.4% with conditions?

 

I have a very manual code snippet but I have found some tips on SURVEYSELECT but I’m open for any suggestions/help.

 

More info below.

Thanks,

Paula

 

The dataset contains 2806 customers/rows in dataset

  • 2570 old customers (2511 has 0-9 employees/59 has 10-19 employees)
  • 236 new customers (202 has 0-9 employees/34 has 10-19 employees)

The list should be distributed accordingly:

1) 60 % to Smorgasbord

2) 11.2 % to GPS

3) 11.2 % to Saab

4) 11.2 % to ABBA

5) 6,4 % to Pacemaker

 

Conditions

This is the tricky bit (at least for me):

1) If the number of employees is greater than 9 the customers must be distributed to Smorgasbord.

2) The distribution between old/new should be the same for the supplier.

3) Besides 1 )and 2): It should be random which customer is given to a certain supplier.

E.g.  

Smorgasbord will have 1028 new customers and 95 old customers AND will have 100 % of customers with a number of employees being greater than 9 (59+34).

Pacemaker will have 165 old customer and 15 new customer AND will only have customers with a number of employees between 0-9.

Dummy code

DATA WORK.temp; 
Infile datalines delimiter=','; 
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ; 

CARDS; 
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Remember that Smorgasbord should have 60%  */
	IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
	Else Split = 'Not assigned yet';
RUN;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

So you data looks like this, correct? Just to get started..

 

data have;
   do customer = 1 to 2806;
      if customer <= 2570 then do;
          oldnew = "Old";
          if customer <= 2511 then do;
             emp = "0-9 Employees ";
             output;
          end;
          else do;
             emp = "10-19 Employees";
             output;
          end;
      end;
      else do;
         oldnew = "New";
         if customer <= (2570 + 202) then do;
            emp = "0-9 Employees ";
            output;
         end;
         else do;
            emp = "10-19 Employees";
            output;
         end;
      end;
   end;
run;

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

So you data looks like this, correct? Just to get started..

 

data have;
   do customer = 1 to 2806;
      if customer <= 2570 then do;
          oldnew = "Old";
          if customer <= 2511 then do;
             emp = "0-9 Employees ";
             output;
          end;
          else do;
             emp = "10-19 Employees";
             output;
          end;
      end;
      else do;
         oldnew = "New";
         if customer <= (2570 + 202) then do;
            emp = "0-9 Employees ";
            output;
         end;
         else do;
            emp = "10-19 Employees";
            output;
         end;
      end;
   end;
run;
Pili1100
Obsidian | Level 7

 

Here is an example with 25 rows (i know a tiny dataset, but the logic should be the same no matter the volume?)

Skärmbild 2021-01-14 135702.jpgSkärmbild 2021-01-14 135828.jpgSkärmbild 2021-01-14 135954.jpg

Condition

1) The customers with >9 empl. must be assigned to Smorgasbord -> 8 of them ( 3 New5 Old )

2) Smorgasbord should have 15 of 25 -> 6 of the new customers and 9 of the old (60 % in total and within New/Old)

Questions

3) How do I randomly assign 3 New and 4 Old more to Smorgasbord so as to add up to 60 %?

4) How to I assign the rest of the accordingly to the supplier's share and old/new? 

DATA WORK.temp; 
Infile datalines delimiter=','; 
INPUT Customer_Id $ Nbr_EMPL Type_of_customer $ ; 

CARDS; 
CU1238A4,0,Old
CU1238B1,0,Old
CU1238C8,1,Old
CU1238D1,1,Old
CU1238E1,1,New
CU1238F1,1,Old
CU1238G9,2,New
CU1238H1,2,New
CU1238I1,2,Old
CU1238J1,9,New
CU1238K7,11,New
CU1238L5,12,Old
CU1238M6,14,Old
CU1238N1,14,New
CU1238O1,17,Old
CU1238P2,2,Old
CU1238Q3,9,Old
CU1238R4,11,Old
CU1238S5,12,Old
CU1238T6,14,New
CU1238U7,1,Old
CU1238V8,1,New
CU1238X9,1,New
CU1238Y10,2,Old
CU1238Z11,2,New
;
RUN;
PROC SORT DATA=temp;
BY Customer_Id Nbr_EMPL;
RUN;
DATA temp2;
SET temp;
LENGTH Split $20;
/*Smorgasbord should have 60% = 15  */
	IF Nbr_EMPL > 9 THEN Split = 'Smorgasbord';
	Else Split = 'Not assigned yet';
	/*15-8 = 7 more must be given to Smorgasbord*/
RUN;
PeterClemmensen
Tourmaline | Level 20

0.40 + 0.112 + 0.112 + 0.112 + 0.064 = 0.8.

 

What do you want to do with the remaining 20% of your data? 

PeterClemmensen
Tourmaline | Level 20

Did you find your answer? 🙂 If so, great. If not, I'll be happy to help you.

Pili1100
Obsidian | Level 7

Hi! 🙂 I will have look at all the replies and see what i find 🙂

Thanks so much for giving me a helping hand! 

Ksharp
Super User
data part1 part2 part3 part4 part5;
 set sashelp.heart;
 call streaminit(12345678);
 _n_=rand('table',.6, .112 ,.112, .112 ,.064);
 select(_n_);
 when(1) output part1;
 when(2) output part2;
 when(3) output part3;
 when(4) output part4;
 when(5) output part5;
 otherwise;
 end;
 
run;
Pili1100
Obsidian | Level 7

Thats a very good start 🙂

BUt how to I assign accordingly to my conditions as well?

Ksharp
Super User
From your 2806 obs get rid of
59 has 10-19 employees)
34 has 10-19 employees)

and using 2806 - 59 - 34 obs to my code and
also correct
60%, 11.2% ,11 .2%, 11.2% and 6.4%
into right probability ,since 2806 has been changed .
Pili1100
Obsidian | Level 7

I actually found one of your other suggestions in a similar thread. 😁

Basically - with help from a colleague - we did create macro-variable for the different shares. Less manual work from me.

 

And then I just put the macro variables in the code like:

This is just for Old. 

 

DATA tempOld tempNew Given_to_SM;
SET WORK.temp2;
	IF (Split NE 'Smorgasbord' AND Type_of_customer = 'Old') THEN OUTPUT  tempOld;	
	ELSE IF (Split NE 'Smorgasbord' AND Type_of_customer = 'New') THEN OUTPUT  tempNew;
		ELSE OUTPUT Given_to_SM;
RUN;

/*Random Old*/
DATA SmorgasbordOld SaabOld ABBAOld GPSOld PacemakerOld;
 SET tempOld  nobs=n_avail;
 CALL streaminit(123456780);
 IF _n_=1 THEN DO;
   ARRAY need {1:5} _temporary_;
   need{1} = round(&ShareOld_Smorgasbord*n_avail);
   need{2} = round(&ShareOld_SAAB*n_avail);
   need{3} = round(&ShareOld_ABBA*n_avail);
   need{4} = round(&ShareOld_GPS*n_avail);
   need{5} = n_avail-sum(of need{*});
 END;
 n=rand('table',need{1}/n_avail,need{2}/n_avail,need{3}/n_avail,need{4}/n_avail);
 IF n=1 THEN  OUTPUT SmorgasbordOld; 
  ELSE IF n=2 THEN  OUTPUT SaabOld; 
   ELSE IF n=3 THEN   OUTPUT ABBAOld;
   ELSE IF n=4 THEN   OUTPUT GPSOld;
    ELSE OUTPUT PacemakerOld;
 need{n}=need{n}-1;
 n_avail+(-1);
DROP n;
RUN;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3309 views
  • 1 like
  • 3 in conversation