Hi,
I am reading through the SAS Specialist Guide book and I am a bit confused about whether or not SAS resets the values in the PDV for each new iteration:
Earlier in the book, it says the variable values in the PDV are reset to missing for each iteration.
Sample code used in book:
data work.update;
set cert.invent;
Total=instock+backord;
SalePrice=(CostPerUnit*0.65)+CostPerUnit;
format CostPerUnit SalePrice dollar6.2;
run;
The variables InStock BackOrd CostPerUnit Total SalePrice are shown in a table with initialized missing values in the book for each starting iteration.
Then the NEXT section in the book has:
When PROC IMPORT reads raw data, SAS sets the value of each variable in the DATA step to missing at the beginning of each cycle of execution, with these exceptions:
In contrast, when reading variables from a SAS data set, SAS sets the values to missing only before the first cycle of execution of the DATA step. Therefore, the variables retain their values until new values become available (for example, through an assignment statement or through the next execution of a SET or MERGE statement). Variables that are created with options in a SET or MERGE statement also retain their values from one cycle of execution to the next.
My questions:
1. In the NEXT section, is it referring to situation for PROC IMPORT? If so, why would you use a PROC IMPORT on a SAS data set? If not, then it contradicts to earlier section in the book.
2. From the book, Variables that are created with options in a SET or MERGE statement, what variables can you create with options other than the in= variables?
3. Can I use the PUTLOG before and after the SET statement to test whether the variables are initialized to missing? Unfortunately for me, I don't have access to SAS. Is there a student version anywhere for people learning SAS?
Thanks!
@cosmid wrote:
I am still unclear about question 1. The code used in the book is also a SET statement, why didn't that retain the value of the variables?
I don't have the book, but variables read by a SET statement are automatically retained until their values are overwritten by the next execution of the SET statement (or any other statement changing those values).
Here is an example (log) using a PUT statement to show variable values in each iteration of a DATA step, before assignment statements and a SET statement are executed:
118 data _null_; 119 put _all_; 120 retvar='Ret'; 121 newvar=123; 122 set sashelp.class(obs=3); 123 retain retvar; 124 run; retvar= newvar=. Name= Sex= Age=. Height=. Weight=. _ERROR_=0 _N_=1 retvar=Ret newvar=. Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 _ERROR_=0 _N_=2 retvar=Ret newvar=. Name=Alice Sex=F Age=13 Height=56.5 Weight=84 _ERROR_=0 _N_=3 retvar=Ret newvar=. Name=Barbara Sex=F Age=13 Height=65.3 Weight=98 _ERROR_=0 _N_=4
As you can see, in the second, third and fourth iteration of the DATA step (_N_=2, 3, 4) the variables from SASHELP.CLASS still contain the values read by the SET statement in the previous iteration of the DATA step. Similarly, starting with the second iteration, the value of the explicitly retained variable RETVAR is available before the assignment statement refreshing it. Variable NEWVAR, however, is not retained, hence reset to missing when the DATA step iterates and these newly created missing values are written to the log for _N_=2, 3, 4. Initially (_N_=1), the values of all variables (except the automatic variables _ERROR_ and _N_) were missing.
Yes, there is a free SAS version for learning called SODA. Check out the link.
Look at the many options of a SET statement, like END= or INDSNAME=. All these create variables.
Hi @cosmid,
to answer your remaining questions:
@cosmid wrote:
1. In the NEXT section, is it referring to situation for PROC IMPORT?
No, that book section is about reading a SAS data set using a SET, MERGE, UPDATE or MODIFY statement. As you say correctly, PROC IMPORT does not read SAS data sets.
3. Can I use the PUTLOG before and after the SET statement to test whether the variables are initialized to missing?
Yes, a PUTLOG statement or just a PUT statement (if writing to the log) are suitable for such tests.
Thank you for the answers. I am still unclear about question 1. The code used in the book is also a SET statement, why didn't that retain the value of the variables?
@cosmid wrote:
I am still unclear about question 1. The code used in the book is also a SET statement, why didn't that retain the value of the variables?
I don't have the book, but variables read by a SET statement are automatically retained until their values are overwritten by the next execution of the SET statement (or any other statement changing those values).
Here is an example (log) using a PUT statement to show variable values in each iteration of a DATA step, before assignment statements and a SET statement are executed:
118 data _null_; 119 put _all_; 120 retvar='Ret'; 121 newvar=123; 122 set sashelp.class(obs=3); 123 retain retvar; 124 run; retvar= newvar=. Name= Sex= Age=. Height=. Weight=. _ERROR_=0 _N_=1 retvar=Ret newvar=. Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 _ERROR_=0 _N_=2 retvar=Ret newvar=. Name=Alice Sex=F Age=13 Height=56.5 Weight=84 _ERROR_=0 _N_=3 retvar=Ret newvar=. Name=Barbara Sex=F Age=13 Height=65.3 Weight=98 _ERROR_=0 _N_=4
As you can see, in the second, third and fourth iteration of the DATA step (_N_=2, 3, 4) the variables from SASHELP.CLASS still contain the values read by the SET statement in the previous iteration of the DATA step. Similarly, starting with the second iteration, the value of the explicitly retained variable RETVAR is available before the assignment statement refreshing it. Variable NEWVAR, however, is not retained, hence reset to missing when the DATA step iterates and these newly created missing values are written to the log for _N_=2, 3, 4. Initially (_N_=1), the values of all variables (except the automatic variables _ERROR_ and _N_) were missing.
I suspect they are trying to simplify to just the issue for the particular type of data step they are talking about in each situation.
To your first question:
1. In the NEXT section, is it referring to situation for PROC IMPORT? If so, why would you use a PROC IMPORT on a SAS data set? If not, then it contradicts to earlier section in the book.
When you use PROC IMPORT on a delimited text file it just serves as a code generator to generate a DATA step. That is why the PDV has importance, because of the DATA step that is being run, not because of the PROC step that created the DATA step.
To your second question:
2. From the book, Variables that are created with options in a SET or MERGE statement, what variables can you create with options other than the in= variables?
Note that the IN= option is a DATASET OPTION, not a SET statement option. You can check the documentation for the SET statement. The two that I use most often are END= and NOBS=.
There are also variables you can create with the INFILE statement. Check out things like END=, LENGTH=, COLUMN=, FILENAME= that are not set missing at the start of an iteration, but are also not constants, but instead change as you execute INPUT statements.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.