Re: Getting Seed value used by SAS

SukumarBalusamy · Posted 04-22-2022 09:13 AM

Hi All,

I have a query, can someone help me?

While creating a Randomization schedule using SAS PROC PLAN if we are allowing SAS to utilize dynamic Seed value, is there any way we can get to know what is the Seed value used by SAS for creating the randomization schedule?

Part of SAS code:

Data _null_;

call symputx ('Seedval', tranwrd(put(ranuni(0), best9.), '0.' , ' '));

run;

proc plan seed=&Seedval. ;

factors BlockID= 50 ordered Treatment=4 random /noprint ;

output out=RandList BlockID nvals=(1 to 50)

Treatment cvals=('Arm A' 'Arm A' 'Arm B' 'Arm B' );

quit;

in my code I missed to output the macro value &Seedval.

Thanks in advance,

Sukumar

Ksharp · Posted 04-22-2022 09:30 AM

NO. you are not able to do it.

ranuni(0)
is using system time as a seed . It is changed every time when you are running code .

SukumarBalusamy · Posted 04-22-2022 09:38 AM

Hello, Thank you for you quick reply. I know the TIME when I executed the code. Is there any chance to get it with the system time?

-Regards,

Sukumar

FreelanceReinh · Posted 04-22-2022 11:01 AM

Hello @SukumarBalusamy and welcome to the SAS Support Communities!

PROC PLAN writes the initial seed (and the final seed) to the log. In addition, you can retrieve these two seeds from automatic macro variables SYSRANDOM and SYSRANEND, respectively, as shown in the log of your code (plus two %PUT statements) below:

568   Data _null_;
569   call symputx ('Seedval', tranwrd(put(ranuni(0), best9.), '0.' , ' '));
570   run;

NOTE: DATA statement used (Total process time):
      real time           0.05 seconds
      cpu time            0.06 seconds


571
572   proc plan seed=&Seedval. ;
NOTE: At the start of processing, random number seed=9261594.
573      factors BlockID= 50 ordered Treatment=4 random /noprint  ;
574      output out=RandList BlockID nvals=(1 to 50)
575      Treatment cvals=('Arm A' 'Arm A' 'Arm B' 'Arm B' );
576   quit;

NOTE: The data set WORK.RANDLIST has 200 observations and 2 variables.
NOTE: At the  end  of processing, random number seed=1219503538.
NOTE: PROCEDURE PLAN used (Total process time):
      real time           0.10 seconds
      cpu time            0.10 seconds


577
578   %put &=sysrandom;
SYSRANDOM=9261594
579   %put &=sysranend;
SYSRANEND=1219503538

SukumarBalusamy · Posted 04-22-2022 12:09 PM

Thank you for the information about %put &=sysrandom; and %put &=sysranend; features of SAS definitely I utilize in my future codes. At present my situation I have created one Randomization schedule, but I don't know the Seed value which SAS has taken. I know the approximate time I executed this code. Is there any way to capture it.

Reeza · Posted 04-22-2022 11:28 AM

Are you trying to get this after you've run the code and need to figure out what the seed was to recreate the plan or document the process?

SukumarBalusamy · Posted 04-22-2022 12:14 PM

Hello, yes you are correct. The code I executed couple of months ago [08-Feb-2022 14:43] , I missed to include the code to capture the seed value. I need to document it and the question is to reproduce the list.

Reeza · Posted 04-22-2022 12:34 PM

I guess you could try running it with the seed of that time, but you like need to try a few minutes after and in seconds which means trying a couple of hundred iterations to see what will match yours. Probably write a macro to loop through the times and use PROC COMPARE to compare the output you have stored with the created output. Make sure to delete the datasets in between so that you don't clog up your spaces. Not sure if it would work but something you could try.

FreelanceReinh · Posted 04-22-2022 02:54 PM

This is an interesting, but difficult task. I don't know how exactly SAS determines the seed from "the time of day" (RANUNI documentation), but I've just figured out how you can recover the actual seed from a ranuni(0) random number.

SAS Log:

1065  data _null_; t=time(); x=ranuni(0); nextseed=x*(2**31-1); put (t x nextseed)(=best16.); run;

t=71854.3729999065 x=0.03466819368055 nextseed=74449379
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.06 seconds


1066  data _null_; t=time(); x=ranuni(0); nextseed=x*(2**31-1); put (t x nextseed)(=best16.); run;

t=71854.4820001125 x=0.4605221792406 nextseed=988963849
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


1067  data _null_; x=ranuni( 195590984); put x=best16.; run;

x=0.03466819368055
NOTE: DATA statement used (Total process time):
      real time           0.06 seconds
      cpu time            0.06 seconds


1068  data _null_; x=ranuni(1764153903); put x=best16.; run;

x=0.4605221792406

The mathematical formula for the seed appears to be: seed = mod(58743242*nextseed, 2**31-1). (The "magic" number 58743242 is the multiplicative inverse of the "multiplier" 397204094 (mentioned in the RANUNI documentation) modulo 2**31-1.

For the first of the two examples above you can use this formula directly in a data step:

1080  data _null_;
1081  nextseed=74449379;
1082  seed = mod(58743242*nextseed, 2**31-1);
1083  put seed;
1084  run;
195590984

For the second example you need more sophisticated code (or just use the Windows calculator calc.exe) to obtain the seed 1764153903, because

58743242*988963849 = 58094942711058458 > constant('exactint')

on a Windows system so that the precision is insufficient.

Now the difficult question is how to obtain the seed value from the time() value (or is datetime() used?), e.g.:

195590984 from 71854.3729999065 and
1764153903 from 71854.4820001125.

Obviously, you must not ignore fractions of a second when you "loop through the times" ...

FreelanceReinh · Posted 04-23-2022 07:33 AM

@FreelanceReinh wrote:

Now the difficult question is how to obtain the seed value from the time() value (or is datetime() used?), e.g.:

195590984 from 71854.3729999065 and

1764153903 from 71854.4820001125.

Obviously, you must not ignore fractions of a second when you "loop through the times" ...

Addendum:

It has turned out that it's even more complicated than that. In a macro loop I created (time(), ranuni(0)) pairs and calculated the corresponding (internal) seed value. Several times successive time() values were exactly identical, yet the internal seed values (and hence the random numbers) were totally different. The first ten observations are shown below.

Obs              time()           ranuni(0)       seed

  1    36776.9690001011    0.35282075235286    1799776034
  2    36776.9839999676    0.87105420598344      36942890
  3    36776.9839999676    0.48406173311363    1140815067
  4    36777.0000000000    0.81978151706037      94069556
  5    36777.0000000000    0.51275829529053    1348101720
  6    36777.0150001049    0.78849600851000    1812341011
  7    36777.0150001049    0.76414637489437     910195803
  8    36777.0309998989    0.98833351442047    1748722162
  9    36777.0309998989    0.44593092074894    2132102851
 10    36777.0469999313    0.18930490696305    2064843663

So, the internal seed is definitely not derived from time() or datetime() alone.

FreelanceReinh · Posted 04-25-2022 01:34 PM

@SukumarBalusamy wrote:
Hello, yes you are correct. The code I executed couple of months ago [08-Feb-2022 14:43] , I missed to include the code to capture the seed value. I need to document it and the question is to reproduce the list.

Hello @SukumarBalusamy,

Sadly, I think from the results shown so far it's fairly obvious that the approximate time stamp "08-Feb-2022 14:43" is not going to give us a useful clue about the unknown seed value:

Even identical values of the TIME() function did not imply similar internal seed values used in a ranuni(0) call.
We don't know the exact form of the time stamp used internally by the RANUNI function when called with argument 0 and we don't know the function used to transform the time stamp into a valid seed value between 1 and 2**31-2. (Not sure if the SAS developers would be willing to share these secrets.)

But there is hope! Your DATA _NULL_ step can only generate up to 10 million different seeds, which is a relatively small subset (<0.5%) of the set of all 2.1 billion possible seeds. Preliminary investigations that I have done indicate that there is a chance to reproduce the results from PROC PLAN in a DATA step. Moreover, much less than your 50 randomized blocks should be sufficient to characterize a seed uniquely: With only 9 blocks we have already 6**9=10,077,696 different possible treatment combinations, i.e. more combinations than seeds. This means that the known result of, e.g., the first 9 (or possibly 7 or 8) blocks in dataset RandList is likely to reduce the number of "candidate" seeds from 10 million to such a small value that it's feasible to test all of these with the complete PROC PLAN step.

So, this might be a promising strategy:

Write a DATA step that runs through the 10 million potential seeds and creates the first 7 to 9 blocks in the same way PROC PLAN would do it (result: a dataset with 10 million observations).
Match the resulting dataset with dataset RandList (or a dataset derived from it) to select a small number n of candidate seeds.
If n>1, run PROC PLAN in a macro loop with n iterations, until RandList is replicated exactly.

Obviously, item 1 is not an easy task ... Alternatively, you could go for a brute-force approach and replace item 1 with a macro loop running 10 million PROC PLAN steps (creating only 7 to 9 blocks each, to save time and disk space). However, I don't know how long this would take to run on your hardware, even if 5 million iterations might be enough to get a hit (if you're lucky). Maybe try with, say, 10,000 iterations and then extrapolate the time.

Basically, the question is, how important it really is for you to find the unknown seed value.

FreelanceReinh · Posted 04-26-2022 09:00 AM

Hello @SukumarBalusamy,

Good news! You will be able to recover your lost seed!

Let me first note that, with a small probability, your DATA _NULL_ step will produce an invalid seed: The BEST9. format will use scientific notation (e.g. 4.657E-10) if ranuni(0) happens to be very small. PROC PLAN would then error out with a log message like

ERROR: The value SEED = 4.657E-10 is not an integer.

But apparently this unlikely case did not happen when you created your RandList dataset.

Further, note that leading zeros in macro variable Seedval will be ignored by PROC PLAN. This means that if we recover a seed of, say, 1234, the true "historical" value of Seedval might have been 1234 or 01234 or 001234 or 0001234, but this is unimportant for PROC PLAN. The probability that two significantly different seeds produce the same PROC PLAN output (with 200 observations) is extremely small, if not zero, and the DATA step below can find all of them, if needed (see comment about the STOP statement).

It turned out that the DATA step that I outlined in item 1 of my previous post is so fast that we don't have to limit it to "7 to 9 blocks." Instead, we let it generate treatments as long as they match the treatments in dataset RandList. Thus we can omit items 2 and 3 of the strategy and just write the recovered seed to the dataset or to the log. On my workstation it took only a few seconds to recover a seed, the longest time (about 15 seconds) if the seed was 9999999, i.e., the last seed checked. Nevertheless, the DATA step also allows for partial matches with RandList (just use a smaller value for macro variable nt and comment out the STOP statement immediately following the OUTPUT statement), which may produce more than one "candidate" seed value.

/* Create treatment format */

proc format;
value trtf
1, 2 = 'A'
3, 4 = 'B';
run;

%let c=397204094; /* "multiplier" found in RANUNI documentation */
%let d=%sysevalf(2**31-1); /* =2147483647 */
%let nt=200; /* number of treatments to be generated: 50 blocks with 4 treatments each */

/* Find "candidate" seed value(s) for PROC PLAN to reproduce (parts of) dataset RandList */

data candseeds(keep=seed);
length t $10;
/* Preparation: compute 10**k * 397204094 modulo 2**31-1, k=1, ..., 9 */
array n[9] _temporary_;
n[1]=mod(10*&c, &d);
do k=2 to 9;
  n[k]=mod(10*n[k-1], &d);
end;
/* Prepare assignment of 2nd and 3rd treatment in a block */
array t2[4,3]   _temporary_ (2 3 4 1 3 4 2 1 4 2 3 1);
array t3[4,4,2] _temporary_ (. . 3 4 2 4 3 2
                             3 4 . . 1 4 3 1 
                             2 4 1 4 . . 1 2
                             3 2 3 1 2 1 . .);
array ppt[&nt] $1 _temporary_; /* for treatments A, A, B, B generated by PROC PLAN */
array trt[&nt]    _temporary_; /* for treatments 1, 2, 3, 4 generated by the DATA step */
/* Read the first &nt treatments from RandList into array ppt */
do p=1 to &nt;
  set RandList(keep=treatment) point=p;
  ppt[p]=char(treatment,5); /* shorten "Arm A" to "A", etc. */
end;
/* Run through all possible seeds>0 */
do seed=1 to 9999999;
  m=seed;
  /* Generate the first &nt treatments like PROC PLAN would do it */
  do i=1 to &nt;
    /* Emulate RANUNI(seed) */
    t=put(m,10.);
    s=input(char(t,10), 1.)*&c;
    do k=1 to length(left(t))-1;
      s+input(char(t,10-k), 1.)*n[k];
    end;
    m=mod(s,&d);
    r=m/&d; /* random number between 0 and 1, computed with the RANUNI algorithm */
    /* Assign treatments based on the random number and the previous treatments in the block */
    select(mod(i,4));
      when(1) trt[i]=ceil(4*r);
      when(2) trt[i]=t2[trt[i-1],ceil(3*r)];
      when(3) trt[i]=t3[trt[i-2],trt[i-1],ceil(2*r)];
      otherwise trt[i]=10-trt[i-3]-trt[i-2]-trt[i-1]; /* = the only remaining treatment */
    end;
    if put(trt[i],trtf.) ne ppt[i] then leave; /* move on to next seed if a discrepancy was found */
  end;
  if i>&nt then do; /* i.e., no discrepancies were found */
    output; /* write "candidate" seed to dataset CANDSEEDS */
    stop; /* Remove this STOP statement if you expect more than one "candidate" seed, */
  end;    /* e.g., if &nt is less than the number of observations in RandList.        */
end;
stop; /* necessary because of SET statement with POINT= option */
run;

Please run the code above using your RandList dataset. The expected outcome is a single observation in dataset CANDSEEDS containing the seed value with which PROC PLAN should reproduce dataset RandList.

Reeza · Posted 04-26-2022 11:23 AM

Impressive 😄

FreelanceReinh · Posted 04-26-2022 12:30 PM

Thanks, @Reeza. 🙂 It was an exciting challenge.

Luckily, the treatment assignments by PROC PLAN based on uniformly distributed random numbers in (0, 1) were pretty much straightforward (see CEIL function calls), except for some unexpected permutations, which I handled with the arrays t2 and t3. Most importantly, the algorithm was the same for all treatment blocks.

Also, without the simplicity of the random number generator implemented in PROC PLAN (same as in the RANUNI function, which had to be emulated in the data step) the task would have been much more difficult.

Registration is open

SAS Training: Just a Click Away