Hey everyone,
I have a data set where I am required to select french speaking customers and place them in Bucket 1. I am then required split the remainer of the customers between Bucket 1 and Bucket 2 so that both buckets have the same number of customers.
E.g.
Total Customers: 1000
French spk: 200 (designated to Bucket 1)
Therefore, Bucket 1 gets 300 and Buckt 2 gets 500 of the remaining customers.
Thanks for all the help.
Do the remaining customers have to be selected randomly?
Are you supposed to have 2 datasets as a result or add a variable?
If randomness isn't a serious concern sort the data so that the French speakers are first.
To add a bucket variable:
data want;
set sorted;
if French then bucket=1;
else do;
count+1;
if count le 100 then bucket = 1;
else bucket = 2;
end;
drop count;
run;
if two data sets
data bucket1 bucket2;
set sorted;
if French then output bucket1;
else do;
count+1;
if count le 100 then output bucket1;
else output bucket2;
end;
drop count;
run;
Q1: Sorry, I should have mentioned that the remaining customers do need to be selected randomly.
Q2: I should add a variable rather than have 2 separate data sets.
I'm new to coding but I follow what you've done so far - how would it change by conducting random selection of the remaining customers?
Thanks again!
proc surveyselect data=have (where=(French='NO')) /* what ever would be needed to say none of the French speaking records*/
out=want sampsize=500 outall;
run;
Will indicate 300 (sampsize) records as selected (a variable named Selected is added and values of 1 mean in that group). The remaining will be Selected=0;
Then add the French back:
Data finalwant;
set want
have (where=(French='YES') in=infrench)
;
if infrench then selected=0;
run;
the Selected =0 would be the French + bucket, 1 the other. You could rename Selected to Bucket if desired.
Thanks for the help!
I ended up using a hybrid of the stuff I found here to get my answer. It's probably not as efficient as what you guys did but I know I can at least explain what I'm doing.
Cheers!
You could try something like this:
data temp;
call streaminit(167353);
set myCustomers;
if French then rnd = -1;
else rnd = rand("UNIFORM");
run;
proc sort data=temp; by rnd; run;
%let totalCust=1000;
data want;
set temp;
if 2*_n_ <= &totalCust. then do;
bucket = 1;
output;
end;
else if _n_ <= &totalCust. then do;
bucket = 2;
output;
end;
drop rnd;
run;
(untested)
PG
Message was edited by: PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.