I found some basic explanation on call streaminit routine to specify a seed value to use for subsequent random number generation by the RAND function. However, I try to understand better of this concept and ideally with a specific use case. I see the following code which is to create a new variable "rand_num" based on the customer invoice number variable named "invoice_nm".
I wonder if anyone could guide me or explain the meaning of the codes for data step that created the dataset "final"? I do not have any additional information on the big pictures of this program, all I know is that is for filling the missing data or for data imputation purpose. Any advice would be greatly appreciated.
data simple_data;
input invoice_nm;
datalines;
13011725
48125891
58509176
82478132
44424919
10605934
31859010
12825087
37489700
18547980
29057182
36961654
38808747
41324767
87265097
79995217
76234154
58838431
85097443
34942259
96916424
;
run;
data final;
set simple_data;
call streaminit(invoice_nm);
rand_num = round(rand('uniform'),0.001) ;
run;
As Kurt mentioned, CALL STREAMINIT() is an unusual routine, where only the first call to STREAMINIT() in a step has an effect, subsequent calls are ignored. When I see this code:
data final;
set simple_data;
call streaminit(invoice_nm);
rand_num = round(rand('uniform'),0.001) ;
run;
It makes me think that the author might have thought there was value in calling STREAMINIT once for each record in SIMPLE_DATA, and that there was value in using a different seed value (the value of INVOICE_NM) on each call. In fact the code is equivalent to:
data final;
set simple_data;
if _n_=1 then call streaminit(invoice_nm) ;
rand_num = round(rand('uniform'),0.001) ;
run;
To me, the second code makes clear that the author knows that STREAMINIT is only executed once.
For argument's sake, some might say that my second code is less efficient, because on every iteration of the DATA step, it needs to test IF _N_=1.
So while I can't say that the original code is wrong, if I saw it, it would raise a warning flag for me.
But others may disagree. The docs have examples like:
data Class;
set Sashelp.Class(where=(sex='M'));
call streaminit( 'mt64', 27182818284590 );
Random = rand( 'uniform' );
run;
which I don't love for the same reason. But at least the examples all pass literal values to STREAMINIT, and not variables.
From the documentation of the CALL STREAMINIT Routine :
All the streams in the DATA step are initialized by using the seed value in the first call to STREAMINIT. Subsequent calls to STREAMINIT are ignored.
So only the first number of the dataset is used.
It sounds like you inherited this code. It's scary to inherit code like this, where it probably works, but the manner in which it is written suggests that the author does not understand CALL STREAMINIT. It suggests you should take extra caution when reviewing other code from the same source.
As Kurt mentioned, CALL STREAMINIT() is an unusual routine, where only the first call to STREAMINIT() in a step has an effect, subsequent calls are ignored. When I see this code:
data final;
set simple_data;
call streaminit(invoice_nm);
rand_num = round(rand('uniform'),0.001) ;
run;
It makes me think that the author might have thought there was value in calling STREAMINIT once for each record in SIMPLE_DATA, and that there was value in using a different seed value (the value of INVOICE_NM) on each call. In fact the code is equivalent to:
data final;
set simple_data;
if _n_=1 then call streaminit(invoice_nm) ;
rand_num = round(rand('uniform'),0.001) ;
run;
To me, the second code makes clear that the author knows that STREAMINIT is only executed once.
For argument's sake, some might say that my second code is less efficient, because on every iteration of the DATA step, it needs to test IF _N_=1.
So while I can't say that the original code is wrong, if I saw it, it would raise a warning flag for me.
But others may disagree. The docs have examples like:
data Class;
set Sashelp.Class(where=(sex='M'));
call streaminit( 'mt64', 27182818284590 );
Random = rand( 'uniform' );
run;
which I don't love for the same reason. But at least the examples all pass literal values to STREAMINIT, and not variables.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.