BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9

I found some basic explanation on call streaminit routine to specify a seed value to use for subsequent random number generation by the RAND function. However, I try to understand better of this concept and ideally with a specific use case.  I see the following code which is to create a new variable "rand_num" based on the customer invoice number variable named "invoice_nm".

 

I wonder if anyone could guide me or explain the meaning of the codes for data step that created the dataset "final"? I do not have any additional information on the big pictures of this program, all I know is that is for filling the missing data or for data imputation purpose. Any advice would be greatly appreciated. 

 

data simple_data;
    input invoice_nm;
    datalines;
13011725
48125891
58509176
82478132
44424919
10605934
31859010
12825087
37489700
18547980
29057182
36961654
38808747
41324767
87265097
79995217
76234154
58838431
85097443
34942259
96916424
;
run;

data final;
	set simple_data;
	call streaminit(invoice_nm);
	rand_num = round(rand('uniform'),0.001) ;
run;


1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

As Kurt mentioned, CALL STREAMINIT() is an unusual routine, where only the first call to STREAMINIT() in a step has an effect, subsequent calls are ignored.  When I see this code:

 

data final;
	set simple_data;
	call streaminit(invoice_nm);
	rand_num = round(rand('uniform'),0.001) ;
run;

It makes me think that the author might have thought there was value in calling STREAMINIT once for each record in SIMPLE_DATA, and that there was value in using a different seed value (the value of INVOICE_NM) on each call.  In fact the code is equivalent to:

 

data final;
	set simple_data;
	if _n_=1 then call streaminit(invoice_nm) ;
	rand_num = round(rand('uniform'),0.001) ;
run;

To me, the second code makes clear that the author knows that STREAMINIT is only executed once.

 

For argument's sake, some might say that my second code is less efficient, because on every iteration of the DATA step, it needs to test IF _N_=1.

 

So while I can't say that the original code is wrong, if I saw it, it would raise a warning flag for me.

 

But others may disagree.  The docs have examples like:

data Class;
   set Sashelp.Class(where=(sex='M'));
   call streaminit( 'mt64', 27182818284590 );
   Random = rand( 'uniform' );
run;

which I don't love for the same reason.  But at least the examples all pass literal values to STREAMINIT, and not variables.

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

8 REPLIES 8
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9
Thanks @JosvanderVelden for sharing these links and resources.
Kurt_Bremser
Super User

From the documentation of the CALL STREAMINIT Routine :

All the streams in the DATA step are initialized by using the seed value in the first call to STREAMINIT. Subsequent calls to STREAMINIT are ignored.

So only the first number of the dataset is used.

LL5
Pyrite | Level 9 LL5
Pyrite | Level 9
Thanks @Kurt_Bremser for sharing the documentation link (the one I viewed previously has less information) and specifically highlighting the summary of what it does.
Quentin
Super User

It sounds like you inherited this code.  It's scary to inherit code like this, where it probably works, but the manner in which it is written suggests that the author does not understand CALL STREAMINIT.  It suggests you should take extra caution when reviewing other code from the same source.

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9
Thanks @Quentin for this advice. Do you mind to share you thoughts on why it might imply the original author does not understand CALL STREAMINIT by the manner it was written?
Quentin
Super User

As Kurt mentioned, CALL STREAMINIT() is an unusual routine, where only the first call to STREAMINIT() in a step has an effect, subsequent calls are ignored.  When I see this code:

 

data final;
	set simple_data;
	call streaminit(invoice_nm);
	rand_num = round(rand('uniform'),0.001) ;
run;

It makes me think that the author might have thought there was value in calling STREAMINIT once for each record in SIMPLE_DATA, and that there was value in using a different seed value (the value of INVOICE_NM) on each call.  In fact the code is equivalent to:

 

data final;
	set simple_data;
	if _n_=1 then call streaminit(invoice_nm) ;
	rand_num = round(rand('uniform'),0.001) ;
run;

To me, the second code makes clear that the author knows that STREAMINIT is only executed once.

 

For argument's sake, some might say that my second code is less efficient, because on every iteration of the DATA step, it needs to test IF _N_=1.

 

So while I can't say that the original code is wrong, if I saw it, it would raise a warning flag for me.

 

But others may disagree.  The docs have examples like:

data Class;
   set Sashelp.Class(where=(sex='M'));
   call streaminit( 'mt64', 27182818284590 );
   Random = rand( 'uniform' );
run;

which I don't love for the same reason.  But at least the examples all pass literal values to STREAMINIT, and not variables.

 

BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LL5
Pyrite | Level 9 LL5
Pyrite | Level 9
Thanks @Quentin for explaining this thoroughly, it really helps a lot. Now I assume this code is probably meant to calling STREAMINIT based on the first value of the invoice_nm to avoid changing value for the entire rand_num columns in the case of re-run. Either way, your advice is very insightful and thoughtful. I am accepting this as a solution.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1028 views
  • 4 likes
  • 4 in conversation