Solved: Re: Variables values in PDV

Masande · Posted 01-05-2019 11:19 PM

According to the Prep Guide :

When PROC IMPORT reads raw data, SAS sets the value of each variable in the DATA step to missing at the beginning of each cycle of execution, with these exceptions:

Variables that are named in a RETAIN statement
variables that are created in a sum statement
Automatic variables

In contrast, when reading variables from a SAS data set, SAS sets the values to missing only before the first cycle of execution of the DATA step. Therefore, the variables retain their values until new values become available (for example, through an assignment statement or through the next execution of a SET or MERGE statement). Variables that are created with options in a SET or MERGE statement also retain their values from one cycle of execution to the next.

I don't understand this part of the data step execution. For me, with each new iteration, the variables are set to 'missing' for the following code, unless I write "retain total;" or "total+var1;" :

data dataset.new;
	set dataset.old;
	total=sum(total,var1);
run;

which is confirmed when I execute the code. However, the bolded part seems to say the opposite, as dataset.old is a SAS data set: the total value shouldn't be reset to missing and var1 would be added to total.

While the book has probably no error and the code is executed as I expect, what do I miss about its explanation ?

Tom · Posted 01-06-2019 10:23 AM

@Masande wrote:
If I understand it correctly, the variables i and j are coming from a different data set and their values are retained. sum(i,j) is newly created, hence its value is set to missing.

Is that what I'm supposed to see?

I and J are variables being read from a dataset. So they are retained, but in a normal simple data step where the SET statement is the first thing that execute is really doesn't matter whether they are "retained" or not since whatever value they had is immediately changed by executing the SET statement.

SUM(I,J) is a function call and so has nothing to do with the point of the question. The difference between the two steps is that in one the value is being assigned on a variable that is coming from an input dataset and in the other it is being assigned to a variable that is NOT coming from an input dataset. So in one the value BEFORE the set statement reflects the value at the end of the previous iteration. And in the other the value is missing before the assignment statement gives it a value.

In the simple data step generated by PROC IMPORT none of the variables are coming from an input dataset, so none of them are "retained". Also since each iteration of the data step includes in INPUT statement that sets the values of the variables it doesn't really matter whether the variables are retained. That is why it seems strange to mention this issue in the context of PROC IMPORT.

View solution in original post

novinosrin · Posted 01-05-2019 11:51 PM

Can you please run this and check the log to see if this helps your undestanding

data test_data;
do i=1 to 10;
j=i;
output;
end;
run;

data _null_;
put 'before' +2  j= _n_=; /*before*/
set test_data;
j=sum(i,j);
put 'after' +2 j= _n_=;	/*after*/
run;

Compare the above with the below

/*Now testing with new assignmnt var jj*/
data _null_;
put 'before' +2  jj= _n_=; /*before*/
set test_data;
jj=sum(i,j);
put 'after' +2 jj= _n_=;	/*after*/
run;

Masande · Posted 01-06-2019 09:29 AM

If I understand it correctly, the variables i and j are coming from a different data set and their values are retained. sum(i,j) is newly created, hence its value is set to missing.

Is that what I'm supposed to see?

Tom · Posted 01-06-2019 10:23 AM

@Masande wrote:
If I understand it correctly, the variables i and j are coming from a different data set and their values are retained. sum(i,j) is newly created, hence its value is set to missing.

Is that what I'm supposed to see?

I and J are variables being read from a dataset. So they are retained, but in a normal simple data step where the SET statement is the first thing that execute is really doesn't matter whether they are "retained" or not since whatever value they had is immediately changed by executing the SET statement.

SUM(I,J) is a function call and so has nothing to do with the point of the question. The difference between the two steps is that in one the value is being assigned on a variable that is coming from an input dataset and in the other it is being assigned to a variable that is NOT coming from an input dataset. So in one the value BEFORE the set statement reflects the value at the end of the previous iteration. And in the other the value is missing before the assignment statement gives it a value.

In the simple data step generated by PROC IMPORT none of the variables are coming from an input dataset, so none of them are "retained". Also since each iteration of the data step includes in INPUT statement that sets the values of the variables it doesn't really matter whether the variables are retained. That is why it seems strange to mention this issue in the context of PROC IMPORT.

Tom · Posted 01-06-2019 12:25 AM

What does PROC IMPORT have to do with this question about data steps?

Masande · Posted 01-06-2019 09:25 AM

According to the Prep Guide, PROC IMPORT runs a DATA step to read the data. Hence the relation with the first post (which is a quote from the book).

Tom · Posted 01-06-2019 10:17 AM

@Masande wrote:
According to the Prep Guide, PROC IMPORT runs a DATA step to read the data. Hence the relation with the first post (which is a quote from the book).

PROC IMPORT will generate and run a DATA step when used to read a delimited text file. But it does not generate a data step to read from structured data, like Excel files.

Who wrote that guide?

Kurt_Bremser · Posted 01-06-2019 04:19 AM

total is NOT read from the dataset, but a newly created variable, and therefore set to missing at the start of each data step iteration.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Innovate 2025: Register Now