According to the Prep Guide :
When PROC IMPORT reads raw data, SAS sets the value of each variable in the DATA step to missing at the beginning of each cycle of execution, with these exceptions:
Variables that are named in a RETAIN statement
variables that are created in a sum statement
Automatic variables
In contrast, when reading variables from a SAS data set, SAS sets the values to missing only before the first cycle of execution of the DATA step. Therefore, the variables retain their values until new values become available (for example, through an assignment statement or through the next execution of a SET or MERGE statement). Variables that are created with options in a SET or MERGE statement also retain their values from one cycle of execution to the next.
I don't understand this part of the data step execution. For me, with each new iteration, the variables are set to 'missing' for the following code, unless I write "retain total;" or "total+var1;" :
data dataset.new;
set dataset.old;
total=sum(total,var1);
run;
which is confirmed when I execute the code. However, the bolded part seems to say the opposite, as dataset.old is a SAS data set: the total value shouldn't be reset to missing and var1 would be added to total.
While the book has probably no error and the code is executed as I expect, what do I miss about its explanation ?
@Masande wrote:
If I understand it correctly, the variables i and j are coming from a different data set and their values are retained. sum(i,j) is newly created, hence its value is set to missing.
Is that what I'm supposed to see?
I and J are variables being read from a dataset. So they are retained, but in a normal simple data step where the SET statement is the first thing that execute is really doesn't matter whether they are "retained" or not since whatever value they had is immediately changed by executing the SET statement.
SUM(I,J) is a function call and so has nothing to do with the point of the question. The difference between the two steps is that in one the value is being assigned on a variable that is coming from an input dataset and in the other it is being assigned to a variable that is NOT coming from an input dataset. So in one the value BEFORE the set statement reflects the value at the end of the previous iteration. And in the other the value is missing before the assignment statement gives it a value.
In the simple data step generated by PROC IMPORT none of the variables are coming from an input dataset, so none of them are "retained". Also since each iteration of the data step includes in INPUT statement that sets the values of the variables it doesn't really matter whether the variables are retained. That is why it seems strange to mention this issue in the context of PROC IMPORT.
Can you please run this and check the log to see if this helps your undestanding
data test_data;
do i=1 to 10;
j=i;
output;
end;
run;
data _null_;
put 'before' +2 j= _n_=; /*before*/
set test_data;
j=sum(i,j);
put 'after' +2 j= _n_=; /*after*/
run;
Compare the above with the below
/*Now testing with new assignmnt var jj*/
data _null_;
put 'before' +2 jj= _n_=; /*before*/
set test_data;
jj=sum(i,j);
put 'after' +2 jj= _n_=; /*after*/
run;
@Masande wrote:
If I understand it correctly, the variables i and j are coming from a different data set and their values are retained. sum(i,j) is newly created, hence its value is set to missing.
Is that what I'm supposed to see?
I and J are variables being read from a dataset. So they are retained, but in a normal simple data step where the SET statement is the first thing that execute is really doesn't matter whether they are "retained" or not since whatever value they had is immediately changed by executing the SET statement.
SUM(I,J) is a function call and so has nothing to do with the point of the question. The difference between the two steps is that in one the value is being assigned on a variable that is coming from an input dataset and in the other it is being assigned to a variable that is NOT coming from an input dataset. So in one the value BEFORE the set statement reflects the value at the end of the previous iteration. And in the other the value is missing before the assignment statement gives it a value.
In the simple data step generated by PROC IMPORT none of the variables are coming from an input dataset, so none of them are "retained". Also since each iteration of the data step includes in INPUT statement that sets the values of the variables it doesn't really matter whether the variables are retained. That is why it seems strange to mention this issue in the context of PROC IMPORT.
What does PROC IMPORT have to do with this question about data steps?
@Masande wrote:
According to the Prep Guide, PROC IMPORT runs a DATA step to read the data. Hence the relation with the first post (which is a quote from the book).
PROC IMPORT will generate and run a DATA step when used to read a delimited text file. But it does not generate a data step to read from structured data, like Excel files.
Who wrote that guide?
total is NOT read from the dataset, but a newly created variable, and therefore set to missing at the start of each data step iteration.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.