05-12-2015 11:24 AM
I have a data set where I am required to select french speaking customers and place them in Bucket 1. I am then required split the remainer of the customers between Bucket 1 and Bucket 2 so that both buckets have the same number of customers.
Total Customers: 1000
French spk: 200 (designated to Bucket 1)
Therefore, Bucket 1 gets 300 and Buckt 2 gets 500 of the remaining customers.
Thanks for all the help.
05-12-2015 12:11 PM
Do the remaining customers have to be selected randomly?
Are you supposed to have 2 datasets as a result or add a variable?
If randomness isn't a serious concern sort the data so that the French speakers are first.
To add a bucket variable:
if French then bucket=1;
if count le 100 then bucket = 1;
else bucket = 2;
if two data sets
data bucket1 bucket2;
if French then output bucket1;
if count le 100 then output bucket1;
else output bucket2;
05-12-2015 12:26 PM
Q1: Sorry, I should have mentioned that the remaining customers do need to be selected randomly.
Q2: I should add a variable rather than have 2 separate data sets.
I'm new to coding but I follow what you've done so far - how would it change by conducting random selection of the remaining customers?
05-12-2015 01:48 PM
proc surveyselect data=have (where=(French='NO')) /* what ever would be needed to say none of the French speaking records*/
out=want sampsize=500 outall;
Will indicate 300 (sampsize) records as selected (a variable named Selected is added and values of 1 mean in that group). The remaining will be Selected=0;
Then add the French back:
have (where=(French='YES') in=infrench)
if infrench then selected=0;
the Selected =0 would be the French + bucket, 1 the other. You could rename Selected to Bucket if desired.
05-12-2015 02:36 PM
Thanks for the help!
I ended up using a hybrid of the stuff I found here to get my answer. It's probably not as efficient as what you guys did but I know I can at least explain what I'm doing.
05-12-2015 12:39 PM
You could try something like this:
if French then rnd = -1;
else rnd = rand("UNIFORM");
proc sort data=temp; by rnd; run;
if 2*_n_ <= &totalCust. then do;
bucket = 1;
else if _n_ <= &totalCust. then do;
bucket = 2;
Message was edited by: PG