BookmarkSubscribeRSS Feed
ejc09
Calcite | Level 5

Hey everyone,

I have a data set where I am required to select french speaking customers and place them in Bucket 1. I am then required split the remainer of the customers between Bucket 1 and Bucket 2 so that both buckets have the same number of customers.

E.g.

Total Customers: 1000

     French spk: 200 (designated to Bucket 1)

Therefore, Bucket 1 gets 300 and Buckt 2 gets 500 of the remaining customers.

Thanks for all the help.

5 REPLIES 5
ballardw
Super User

Do the remaining customers have to be selected randomly?

Are you supposed to have 2 datasets as a result or add a variable?

If randomness isn't a serious concern sort the data so that the French speakers are first.

To add a bucket variable:

data want;

     set sorted;

     if French then bucket=1;

     else do;

          count+1;

          if count le 100 then bucket = 1;

          else bucket = 2;

     end;

     drop count;

run;

if two data sets

data bucket1 bucket2;

     set sorted;

     if French then output bucket1;

     else do;

          count+1;

          if count le 100 then output bucket1;

          else output bucket2;

     end;

     drop count;

run;

ejc09
Calcite | Level 5

Q1: Sorry, I should have mentioned that the remaining customers do need to be selected randomly.

Q2: I should add a variable rather than have 2 separate data sets.

I'm new to coding but I follow what you've done so far - how would it change by conducting random selection of the remaining customers?

Thanks again!

ballardw
Super User

proc surveyselect data=have (where=(French='NO')) /* what ever would be needed to say none of the French speaking records*/

    out=want sampsize=500 outall;

run;

Will indicate 300 (sampsize) records as selected (a variable named Selected is added and values of 1 mean in that group). The remaining will be Selected=0;

Then add  the French back:

Data finalwant;

     set want

          have (where=(French='YES') in=infrench)

     ;

     if infrench then selected=0;

run;

the Selected =0 would be the French + bucket, 1 the other. You could rename Selected to Bucket if desired.

ejc09
Calcite | Level 5

Thanks for the help!

I ended up using a hybrid of the stuff I found here to get my answer. It's probably not as efficient as what you guys did but I know I can at least explain what I'm doing.

Cheers!

PGStats
Opal | Level 21

You could try something like this:

data temp;

call streaminit(167353);

set myCustomers;

if French then rnd = -1;

else rnd = rand("UNIFORM");

run;

proc sort data=temp; by rnd; run;

%let totalCust=1000;

data want;

set temp;

if 2*_n_ <= &totalCust. then do;

  bucket = 1;

  output;

  end;

else if _n_ <=  &totalCust. then do;

  bucket = 2;

  output;

  end;

drop rnd;

run;

(untested)

PG

Message was edited by: PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 479 views
  • 6 likes
  • 3 in conversation