BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yawenyu929
Fluorite | Level 6

Hello,

 

I have a dataset to start that looks like this:

 

NAME

Adam

Anne

Alex

Allen 

and so on. The number of obs doesn't really matter as it is a growing dataset. 

 

For reporting purposes, I would like to randomly assign either 1 or 2 to a name with a 40-60 split. 40% of name will be assigned 1 and 60% will be assigned 2:

 

NAME

Adam   1

Anne    2

Alex     2 

Allen    1

and so on.

 

This part is pretty simple, but the tricky part is having each name retain the same assigned number even when new names are added to the dataset. What I mean by this is if next week, if there are two more names added, the old names should not be assigned a new number, but should keep the old number. The new names will be randomly assigned a new number:

 

NAME

Adam   1

Anne    2

Alex     2 

Allen    1

Alfred   2

Albert   2

and so on.

 

I've tried proc surveyselect, and it does the first part well, but I am having trouble getting the previous names to not get randomly assigned a number again. If anyone has any suggestions, that would be great. I am not set on using proc surveyselect, any method will do as long as it can produce the correct dataset. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
I'm going to say that your process is wrong.
You can store your data from the previous month/day somewhere. When you get your new data, assign it the new random group and then append it to your master data set that maintains the old groupings.

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

@yawenyu929 Hi and welcome to the SAS Community 🙂

 

This would depend on how you add new names to your SAS data set?

 

See if you can use this as a template

 

data have;
input Name $ 1-6 Group;
infile datalines missover;
datalines;
Adam   1
Anne   2
Alex   2
Allen  1
Alfred  
Albert  
;

data want;
   set have;
   group = ifn(group=., rand('table', .6), group);
run;
Astounding
PROC Star

A DATA step can address this.  For example:

 

data want;
   set have;
   if group = . then group = 1 + (ranuni(12345) > .4);
run;
yawenyu929
Fluorite | Level 6

Hi @PeterClemmensen and @Astounding! Thank you for your quick reply. I see what your codes are trying to do, but I would be starting with just the Name variable, nothing else. so just:

 

NAME

Adam

Anne

Alex

Allen 

and so on. 

 

I am hoping to combine the first and second part into a more efficient program. Please let me know if I can clarify anything! 😄 

PeterClemmensen
Tourmaline | Level 20

Would work even with just the Name variable as well?

 

data have;
input Name $ 1-6;
infile datalines missover;
datalines;
Adam  
Anne  
Alex  
Allen 
Alfred
Albert
;

data want;
   set have;
   group = ifn(group=., rand('table', .6), group);
run;
yawenyu929
Fluorite | Level 6

Hi, draycut! Thanks for the clarification. Your code works perfectly to assign the numbers at 40-60 split, but it doesn't retain the same assigned numbers if new names are added. So if Adam, Anne, Alex, and Allen are assigned 1, 2, 2, 1 this week, then next week after adding new names, they should still be assigned 1, 2, 2, 1. Does that help to clarify?

novinosrin
Tourmaline | Level 20

HI @yawenyu929  The issue to my mind is the WANT dataset for the 1st run is fine as it assigns new random numbers for the very first run i.e your start.

For the subsequent week, you want process the exisitng want with EXCEPT and additional HAVE with SET operation.

 

The EXCEPT piece of want shall retain the same values while the additional names in  HAVE shall have the newly computed random number. 

Tom
Super User Tom
Super User

@yawenyu929 wrote:

Hi, draycut! Thanks for the clarification. Your code works perfectly to assign the numbers at 40-60 split, but it doesn't retain the same assigned numbers if new names are added. So if Adam, Anne, Alex, and Allen are assigned 1, 2, 2, 1 this week, then next week after adding new names, they should still be assigned 1, 2, 2, 1. Does that help to clarify?


Where are the new names coming from?  If you added them to the dataset with the group variable then why did you remove the values of the group variable that were already there?

Are you saying you are getting a new file that some old and new names?  Just merge the two and then the same IF based logic will work.

data want;
  merge old new;
  by name;
  if group=. then ....
run;
Patrick
Opal | Level 21

Below might give you some guidance.

/* day 0 */
data master;
  stop;
  length name $8 group 8;
  call missing(of _all_);
run;

/* day 1 */
data names;
  set sashelp.class(keep=name obs=10);
run;

proc sql;
  create table _newNames as
  select name, ifn(rand("uniform")<=0.4,1,2) as group
  from
    (
      select name
      from names
      except
      select name 
      from master
    )
  ;
quit;
proc append base=master data=_newNames;
run;
proc delete data=_newnames;
run;quit;

/* day 2 */
data names;
  set sashelp.class(keep=name);
run;

proc sql;
  create table _newNames as
  select name, ifn(rand("uniform")<=0.4,1,2) as group
  from
    (
      select name
      from names
      except
      select name 
      from master
    )
  ;
quit;
proc append base=master data=_newNames;
run;
proc delete data=_newnames;
run;quit;


proc print data=master;
run;
Reeza
Super User
I'm going to say that your process is wrong.
You can store your data from the previous month/day somewhere. When you get your new data, assign it the new random group and then append it to your master data set that maintains the old groupings.
yawenyu929
Fluorite | Level 6

So I'm understanding the problem better, and did as draycut suggested. I am working around the issue of new names getting added by outputting with _[date] in the name so that the previous is not overwritten, then using that as a key for outputting only new names in the updated dataset, randomly assigning again, and merging the two datasets back together. I will have to manually change the name of the dataset I'm reading in as [file]_[date], but I can't think of a better way at the moment... Thanks everyone who took the time to answer! 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 895 views
  • 5 likes
  • 7 in conversation