Hey everyone,
I have a data set where I am required to select french speaking customers and place them in Bucket 1. I am then required split the remainer of the customers between Bucket 1 and Bucket 2 so that both buckets have the same number of customers.
E.g.
Total Customers: 1000
French spk: 200 (designated to Bucket 1)
Therefore, Bucket 1 gets 300 and Buckt 2 gets 500 of the remaining customers.
Thanks for all the help.
Do the remaining customers have to be selected randomly?
Are you supposed to have 2 datasets as a result or add a variable?
If randomness isn't a serious concern sort the data so that the French speakers are first.
To add a bucket variable:
data want;
set sorted;
if French then bucket=1;
else do;
count+1;
if count le 100 then bucket = 1;
else bucket = 2;
end;
drop count;
run;
if two data sets
data bucket1 bucket2;
set sorted;
if French then output bucket1;
else do;
count+1;
if count le 100 then output bucket1;
else output bucket2;
end;
drop count;
run;
Q1: Sorry, I should have mentioned that the remaining customers do need to be selected randomly.
Q2: I should add a variable rather than have 2 separate data sets.
I'm new to coding but I follow what you've done so far - how would it change by conducting random selection of the remaining customers?
Thanks again!
proc surveyselect data=have (where=(French='NO')) /* what ever would be needed to say none of the French speaking records*/
out=want sampsize=500 outall;
run;
Will indicate 300 (sampsize) records as selected (a variable named Selected is added and values of 1 mean in that group). The remaining will be Selected=0;
Then add the French back:
Data finalwant;
set want
have (where=(French='YES') in=infrench)
;
if infrench then selected=0;
run;
the Selected =0 would be the French + bucket, 1 the other. You could rename Selected to Bucket if desired.
Thanks for the help!
I ended up using a hybrid of the stuff I found here to get my answer. It's probably not as efficient as what you guys did but I know I can at least explain what I'm doing.
Cheers!
You could try something like this:
data temp;
call streaminit(167353);
set myCustomers;
if French then rnd = -1;
else rnd = rand("UNIFORM");
run;
proc sort data=temp; by rnd; run;
%let totalCust=1000;
data want;
set temp;
if 2*_n_ <= &totalCust. then do;
bucket = 1;
output;
end;
else if _n_ <= &totalCust. then do;
bucket = 2;
output;
end;
drop rnd;
run;
(untested)
PG
Message was edited by: PG
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.