Solved: Re: Retaining Randomly Assigned Variables

yawenyu929 · Posted 02-19-2020 04:30 PM

Hello,

I have a dataset to start that looks like this:

NAME

Adam

Anne

Alex

Allen

and so on. The number of obs doesn't really matter as it is a growing dataset.

For reporting purposes, I would like to randomly assign either 1 or 2 to a name with a 40-60 split. 40% of name will be assigned 1 and 60% will be assigned 2:

NAME

Adam 1

Anne 2

Alex 2

Allen 1

and so on.

This part is pretty simple, but the tricky part is having each name retain the same assigned number even when new names are added to the dataset. What I mean by this is if next week, if there are two more names added, the old names should not be assigned a new number, but should keep the old number. The new names will be randomly assigned a new number:

NAME

Adam 1

Anne 2

Alex 2

Allen 1

Alfred 2

Albert 2

and so on.

I've tried proc surveyselect, and it does the first part well, but I am having trouble getting the previous names to not get randomly assigned a number again. If anyone has any suggestions, that would be great. I am not set on using proc surveyselect, any method will do as long as it can produce the correct dataset. Thank you!

Reeza · Posted 02-19-2020 10:51 PM

I'm going to say that your process is wrong.
You can store your data from the previous month/day somewhere. When you get your new data, assign it the new random group and then append it to your master data set that maintains the old groupings.

View solution in original post

PeterClemmensen · Posted 02-19-2020 04:37 PM

@yawenyu929 Hi and welcome to the SAS Community 🙂

This would depend on how you add new names to your SAS data set?

See if you can use this as a template

data have;
input Name $ 1-6 Group;
infile datalines missover;
datalines;
Adam   1
Anne   2
Alex   2
Allen  1
Alfred  
Albert  
;

data want;
   set have;
   group = ifn(group=., rand('table', .6), group);
run;

The DATA to DATA Step Macro
Blog: SASnrd

Astounding · Posted 02-19-2020 04:38 PM

A DATA step can address this. For example:

data want;
   set have;
   if group = . then group = 1 + (ranuni(12345) > .4);
run;

yawenyu929 · Posted 02-19-2020 04:51 PM

Hi @PeterClemmensen and @Astounding! Thank you for your quick reply. I see what your codes are trying to do, but I would be starting with just the Name variable, nothing else. so just:

NAME

Adam

Anne

Alex

Allen

and so on.

I am hoping to combine the first and second part into a more efficient program. Please let me know if I can clarify anything! 😄

PeterClemmensen · Posted 02-19-2020 04:53 PM

Would work even with just the Name variable as well?

data have;
input Name $ 1-6;
infile datalines missover;
datalines;
Adam  
Anne  
Alex  
Allen 
Alfred
Albert
;

data want;
   set have;
   group = ifn(group=., rand('table', .6), group);
run;

The DATA to DATA Step Macro
Blog: SASnrd

yawenyu929 · Posted 02-19-2020 05:01 PM

Hi, draycut! Thanks for the clarification. Your code works perfectly to assign the numbers at 40-60 split, but it doesn't retain the same assigned numbers if new names are added. So if Adam, Anne, Alex, and Allen are assigned 1, 2, 2, 1 this week, then next week after adding new names, they should still be assigned 1, 2, 2, 1. Does that help to clarify?

novinosrin · Posted 02-19-2020 05:17 PM

HI @yawenyu929 The issue to my mind is the WANT dataset for the 1st run is fine as it assigns new random numbers for the very first run i.e your start.

For the subsequent week, you want process the exisitng want with EXCEPT and additional HAVE with SET operation.

The EXCEPT piece of want shall retain the same values while the additional names in HAVE shall have the newly computed random number.

Tom · Posted 02-19-2020 08:05 PM

@yawenyu929 wrote:

Hi, draycut! Thanks for the clarification. Your code works perfectly to assign the numbers at 40-60 split, but it doesn't retain the same assigned numbers if new names are added. So if Adam, Anne, Alex, and Allen are assigned 1, 2, 2, 1 this week, then next week after adding new names, they should still be assigned 1, 2, 2, 1. Does that help to clarify?

Where are the new names coming from? If you added them to the dataset with the group variable then why did you remove the values of the group variable that were already there?

Are you saying you are getting a new file that some old and new names? Just merge the two and then the same IF based logic will work.

data want;
  merge old new;
  by name;
  if group=. then ....
run;

Patrick · Posted 02-20-2020 12:17 AM

Below might give you some guidance.

/* day 0 */
data master;
  stop;
  length name $8 group 8;
  call missing(of _all_);
run;

/* day 1 */
data names;
  set sashelp.class(keep=name obs=10);
run;

proc sql;
  create table _newNames as
  select name, ifn(rand("uniform")<=0.4,1,2) as group
  from
    (
      select name
      from names
      except
      select name 
      from master
    )
  ;
quit;
proc append base=master data=_newNames;
run;
proc delete data=_newnames;
run;quit;

/* day 2 */
data names;
  set sashelp.class(keep=name);
run;

proc sql;
  create table _newNames as
  select name, ifn(rand("uniform")<=0.4,1,2) as group
  from
    (
      select name
      from names
      except
      select name 
      from master
    )
  ;
quit;
proc append base=master data=_newNames;
run;
proc delete data=_newnames;
run;quit;


proc print data=master;
run;

Reeza · Posted 02-19-2020 10:51 PM

I'm going to say that your process is wrong.
You can store your data from the previous month/day somewhere. When you get your new data, assign it the new random group and then append it to your master data set that maintains the old groupings.

yawenyu929 · Posted 02-20-2020 01:27 PM

So I'm understanding the problem better, and did as draycut suggested. I am working around the issue of new names getting added by outputting with _[date] in the name so that the previous is not overwritten, then using that as a key for outputting only new names in the updated dataset, randomly assigning again, and merging the two datasets back together. I will have to manually change the name of the dataset I'm reading in as [file]_[date], but I can't think of a better way at the moment... Thanks everyone who took the time to answer!

Ready to join fellow brilliant minds for the SAS Hackathon?