- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I found some basic explanation on call streaminit routine to specify a seed value to use for subsequent random number generation by the RAND function. However, I try to understand better of this concept and ideally with a specific use case. I see the following code which is to create a new variable "rand_num" based on the customer invoice number variable named "invoice_nm".
I wonder if anyone could guide me or explain the meaning of the codes for data step that created the dataset "final"? I do not have any additional information on the big pictures of this program, all I know is that is for filling the missing data or for data imputation purpose. Any advice would be greatly appreciated.
data simple_data;
input invoice_nm;
datalines;
13011725
48125891
58509176
82478132
44424919
10605934
31859010
12825087
37489700
18547980
29057182
36961654
38808747
41324767
87265097
79995217
76234154
58838431
85097443
34942259
96916424
;
run;
data final;
set simple_data;
call streaminit(invoice_nm);
rand_num = round(rand('uniform'),0.001) ;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As Kurt mentioned, CALL STREAMINIT() is an unusual routine, where only the first call to STREAMINIT() in a step has an effect, subsequent calls are ignored. When I see this code:
data final;
set simple_data;
call streaminit(invoice_nm);
rand_num = round(rand('uniform'),0.001) ;
run;
It makes me think that the author might have thought there was value in calling STREAMINIT once for each record in SIMPLE_DATA, and that there was value in using a different seed value (the value of INVOICE_NM) on each call. In fact the code is equivalent to:
data final;
set simple_data;
if _n_=1 then call streaminit(invoice_nm) ;
rand_num = round(rand('uniform'),0.001) ;
run;
To me, the second code makes clear that the author knows that STREAMINIT is only executed once.
For argument's sake, some might say that my second code is less efficient, because on every iteration of the DATA step, it needs to test IF _N_=1.
So while I can't say that the original code is wrong, if I saw it, it would raise a warning flag for me.
But others may disagree. The docs have examples like:
data Class;
set Sashelp.Class(where=(sex='M'));
call streaminit( 'mt64', 27182818284590 );
Random = rand( 'uniform' );
run;
which I don't love for the same reason. But at least the examples all pass literal values to STREAMINIT, and not variables.
Next up: SAS Trivia Quiz hosted by SAS on Wednesday May 21.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
From the documentation of the CALL STREAMINIT Routine :
All the streams in the DATA step are initialized by using the seed value in the first call to STREAMINIT. Subsequent calls to STREAMINIT are ignored.
So only the first number of the dataset is used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It sounds like you inherited this code. It's scary to inherit code like this, where it probably works, but the manner in which it is written suggests that the author does not understand CALL STREAMINIT. It suggests you should take extra caution when reviewing other code from the same source.
Next up: SAS Trivia Quiz hosted by SAS on Wednesday May 21.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As Kurt mentioned, CALL STREAMINIT() is an unusual routine, where only the first call to STREAMINIT() in a step has an effect, subsequent calls are ignored. When I see this code:
data final;
set simple_data;
call streaminit(invoice_nm);
rand_num = round(rand('uniform'),0.001) ;
run;
It makes me think that the author might have thought there was value in calling STREAMINIT once for each record in SIMPLE_DATA, and that there was value in using a different seed value (the value of INVOICE_NM) on each call. In fact the code is equivalent to:
data final;
set simple_data;
if _n_=1 then call streaminit(invoice_nm) ;
rand_num = round(rand('uniform'),0.001) ;
run;
To me, the second code makes clear that the author knows that STREAMINIT is only executed once.
For argument's sake, some might say that my second code is less efficient, because on every iteration of the DATA step, it needs to test IF _N_=1.
So while I can't say that the original code is wrong, if I saw it, it would raise a warning flag for me.
But others may disagree. The docs have examples like:
data Class;
set Sashelp.Class(where=(sex='M'));
call streaminit( 'mt64', 27182818284590 );
Random = rand( 'uniform' );
run;
which I don't love for the same reason. But at least the examples all pass literal values to STREAMINIT, and not variables.
Next up: SAS Trivia Quiz hosted by SAS on Wednesday May 21.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content