Hi All, I've created the following macro to generate random IDs/passwords. It essentially constructs a data step with a bunch of output statements. It allows the user to specify the allowed characters, number of passwords and password length. Also included is a check to ensure duplicate IDs/passwords are not included.
My question: Is there a more efficient way to accomplish this? Tweaks to my code or a fundamentally different approach? This macro bogs down when asked to generate 10000 or more IDs/passwords. I originally tried to generate a DATALINES statement with the macro (not allowed) and also tried PROC SQL INSERT, which works but isn't any faster.
*characters to place in the ID, space delimited;
%let allowed = 1 2 3 4 5 6 7 8 9 a b c d e f h i j k m n p r s t u v w x y z;
%let numIDs = 100; *number of IDs to create;
%let idLen = 6; *number of characters per ID;
%macro ran6;
/*delimeters for concatenating macro variables*/
%let delim_nospace = %str();
%let delim_space = %str( );
%let numChars = %length(%sysfunc(compress(&allowed,' '))); *num of total characters available;
*%put numchars: &numChars;
%let used = ; *generate IDs go here to check for duplicates;
*output dataset;
data id_output;
call streaminit(23); *set random seed;
length userID $ &idLen.;
*this loop controls the number of IDs to generate;
%do i = 1 %to &numIDs;
*this loop controls the length of an ID;
%do j = 1 %to &idLen;
*follow http://blogs.sas.com/content/iml/2011/08/24/how-to-generate-random-numbers-in-sas/;
%let u = %sysfunc( rand(UNIFORM) ); *u is random decimal;
/*sysevalf works because https://communities.sas.com/message/51523#51523*/
%let x = %sysevalf(1 + (&numChars - 1 ) * &u.);
*set random into with upper bound eq to number of characters;
%let k = %sysevalf(&numChars * &u , ceil) ;
*pull a letter fro the allowed list;
%let IDPart = %scan(&allowed.,&k, ' ');
/*%put u: &u x: &x k: &k IDPart: &IDPart;*/
%if &j = 1 %then %let userID = &IDPart;
/*concatenate macro variables, see https://communities.sas.com/message/180272 */
%else %let userID = %sysfunc(catx(&delim_nospace.,%nrstr(%superq(userID)),%nrstr(%superq(IDPart))));
%end;
/*if duplicate found, do not increment i*/
%if %index(&used, &userID) %then %let i = %eval(&i -1);
/*no duplicat found, write output statement*/
%else %do;
userID = "&userID";
output;
%end;
/*add mewest ID to macro variable for duplicate checking*/
%let used = %sysfunc(catx(&delim_space,%nrstr(%superq(used)),%nrstr(%superq(userID))));
/*%put used: &used;*/
/*%put UserID: &userID;*/
%end;
run;
%mend;
%ran6;
PROC PLAN has some features that make this pretty easy.
Hi,
Its possible your over complicating things with the macro code:
data temp;
allowed="123456789abcdefghijklmnopqrstuvwxyz";
num_ids=100;
idlen=6;
length my_id $6;
num_allowed=length(strip(allowed));
do i=1 to num_ids;
do j=1 to 6;
temp=ceil((num_allowed * rand("uniform")));
substr(my_id,j,1)=substr(allowed,temp,1);
end;
output;
end;
run;
Hi RW9,
Much more elegant solution. The one item missing though is a guard against duplicate IDs. I realize one could generate extra IDs then dedup and reduce to the number needed but I'm trying to avoid that.
Well, if its a hundred then you could apply them into an array rather than outputting, then keep looping over until the generated text is not in the array. Am leaving for the day, but can update the code tomorrow to this effect e.g. array the_ids{num_ids}...
Just to finish off from where I was yesterday, this should give unique results:
data temp (keep=id_out);
allowed="123456789abcdefghijklmnopqrstuvwxyz";
num_ids=100;
idlen=6;
array done{100} $200.;
length my_id $6;
num_allowed=length(strip(allowed));
i=1;
do until (i=num_ids+1);
do j=1 to 6;
temp=ceil((num_allowed * rand("uniform")));
substr(my_id,j,1)=substr(allowed,temp,1);
end;
if whichc(my_id,of done{*})=0 then do;
done{i}=my_id;
i=i+1;
id_out=my_id;
output;
end;
end;
run;
Your macro bogs down because you are generating thousands of lines of macro code. You have basically created a program that does this:
data want;
assign value; output;
assign value; output;
assign value; output;
assign value; output;
/* repeat*/
run;
where each of those assigns calls a lot of functions
A hint: if you are using that many calls to %sysfunc you might be looking at more of a datastep solution:
%macro ran6(numids= 10, idlen=6, allowed=123456789abcdefhijkmnprstuvwxyz);
data id_output;
call streaminit(23); *set random seed;
length userID $ &idLen.;
numchars=length("&allowed");
*this loop controls the number of IDs to generate;
do i = 1 to &numIDs;
*this loop controls the length of an ID;
do j = 1 to &idLen;
u = rand('UNIFORM');
x = 1 + (numChars - 1 ) * u;
*set random into with upper bound eq to number of characters;
k = ceil(numChars * u) ;
*pull a letter fro the allowed list;
IDPart = substr("&allowed",k,1);
if j = 1 then userID = IDPart;
/*concatenate macro variables, see https://communities.sas.com/message/180272 */
else userID = cats(userid,idpart);
end;/*j*/
output;
end; /*i*/
run;
/*remove duplicates */
proc sql noprint;
create table UserIds as
select distinct userid
from id_output;
quit;
%mend;
%ran6;
Change the values after the parameters to customize output for a run of the macro:
%ran6(numid=20); to use the default character list and length.
Also, when writing macros it is not a good idea to use the
*text ; style of comment as they are generated in the macro results. Repeatedly.
Use either %* text ; or /* text */
And for the worry about duplicates, generate more than you "need" as the time spent trying to write extra code probably isn't worth much. And you need to deduplicate across multiple uses of the macro anyway. Hint: sql except.
PROC PLAN has some features that make this pretty easy.
kbk do not make it too complicated when needing performance. A datastep executes fast
As it is more easy to solve the deduplication afterwards you gains in performance are better.
I you know hashing you could use that also. The overhead will be that check (in memory) and the memory constraint.
Check UUIDGEN()
%let id=%sysfunc(uuidgen());
%put &id ;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.