Hello,
I have a dataset to start that looks like this:
NAME
Adam
Anne
Alex
Allen
and so on. The number of obs doesn't really matter as it is a growing dataset.
For reporting purposes, I would like to randomly assign either 1 or 2 to a name with a 40-60 split. 40% of name will be assigned 1 and 60% will be assigned 2:
NAME
Adam 1
Anne 2
Alex 2
Allen 1
and so on.
This part is pretty simple, but the tricky part is having each name retain the same assigned number even when new names are added to the dataset. What I mean by this is if next week, if there are two more names added, the old names should not be assigned a new number, but should keep the old number. The new names will be randomly assigned a new number:
NAME
Adam 1
Anne 2
Alex 2
Allen 1
Alfred 2
Albert 2
and so on.
I've tried proc surveyselect, and it does the first part well, but I am having trouble getting the previous names to not get randomly assigned a number again. If anyone has any suggestions, that would be great. I am not set on using proc surveyselect, any method will do as long as it can produce the correct dataset. Thank you!
@yawenyu929 Hi and welcome to the SAS Community 🙂
This would depend on how you add new names to your SAS data set?
See if you can use this as a template
data have;
input Name $ 1-6 Group;
infile datalines missover;
datalines;
Adam 1
Anne 2
Alex 2
Allen 1
Alfred
Albert
;
data want;
set have;
group = ifn(group=., rand('table', .6), group);
run;
A DATA step can address this. For example:
data want;
set have;
if group = . then group = 1 + (ranuni(12345) > .4);
run;
Hi @PeterClemmensen and @Astounding! Thank you for your quick reply. I see what your codes are trying to do, but I would be starting with just the Name variable, nothing else. so just:
NAME
Adam
Anne
Alex
Allen
and so on.
I am hoping to combine the first and second part into a more efficient program. Please let me know if I can clarify anything! 😄
Would work even with just the Name variable as well?
data have;
input Name $ 1-6;
infile datalines missover;
datalines;
Adam
Anne
Alex
Allen
Alfred
Albert
;
data want;
set have;
group = ifn(group=., rand('table', .6), group);
run;
Hi, draycut! Thanks for the clarification. Your code works perfectly to assign the numbers at 40-60 split, but it doesn't retain the same assigned numbers if new names are added. So if Adam, Anne, Alex, and Allen are assigned 1, 2, 2, 1 this week, then next week after adding new names, they should still be assigned 1, 2, 2, 1. Does that help to clarify?
HI @yawenyu929 The issue to my mind is the WANT dataset for the 1st run is fine as it assigns new random numbers for the very first run i.e your start.
For the subsequent week, you want process the exisitng want with EXCEPT and additional HAVE with SET operation.
The EXCEPT piece of want shall retain the same values while the additional names in HAVE shall have the newly computed random number.
@yawenyu929 wrote:
Hi, draycut! Thanks for the clarification. Your code works perfectly to assign the numbers at 40-60 split, but it doesn't retain the same assigned numbers if new names are added. So if Adam, Anne, Alex, and Allen are assigned 1, 2, 2, 1 this week, then next week after adding new names, they should still be assigned 1, 2, 2, 1. Does that help to clarify?
Where are the new names coming from? If you added them to the dataset with the group variable then why did you remove the values of the group variable that were already there?
Are you saying you are getting a new file that some old and new names? Just merge the two and then the same IF based logic will work.
data want;
merge old new;
by name;
if group=. then ....
run;
Below might give you some guidance.
/* day 0 */
data master;
stop;
length name $8 group 8;
call missing(of _all_);
run;
/* day 1 */
data names;
set sashelp.class(keep=name obs=10);
run;
proc sql;
create table _newNames as
select name, ifn(rand("uniform")<=0.4,1,2) as group
from
(
select name
from names
except
select name
from master
)
;
quit;
proc append base=master data=_newNames;
run;
proc delete data=_newnames;
run;quit;
/* day 2 */
data names;
set sashelp.class(keep=name);
run;
proc sql;
create table _newNames as
select name, ifn(rand("uniform")<=0.4,1,2) as group
from
(
select name
from names
except
select name
from master
)
;
quit;
proc append base=master data=_newNames;
run;
proc delete data=_newnames;
run;quit;
proc print data=master;
run;
So I'm understanding the problem better, and did as draycut suggested. I am working around the issue of new names getting added by outputting with _[date] in the name so that the previous is not overwritten, then using that as a key for outputting only new names in the updated dataset, randomly assigning again, and merging the two datasets back together. I will have to manually change the name of the dataset I'm reading in as [file]_[date], but I can't think of a better way at the moment... Thanks everyone who took the time to answer!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.