data _NULL_; length out $5000.; set default_mth; retain out; out=Trim(out)||'staging.mthbal_'||default_mth; call symput("list",out); run; data mthbal; set &list; run;
what I dont understand is the line 'set &list', I thought &list is a macro variable and not a dataset...
why is it used this way?
also in the able, what is 'retain out' doing? there is no output dataset as _null_ is used, does 'retain out' means saving it for later use?
The macro variable just contains text it does NOT contain a dataset. The macro processor (in general) is use to generate SAS code. When the macro processor finishes resolving all of the & and % macro triggers the resulting text is passed onto SAS to interpret as code.
So if you assign the value SASHELP.CLASS to the macro variable LIST and then run this statement:
set &list ;
The macro processor will replace &list with the value of list and SAS itself will see this code:
set SASHELP.CLASS ;
So in this code the macro variable LIST can contain anything that makes the generated line a valid SET statement.
Based on the name they used for the macro variable I suspect that the goal was to make space delimited list of dataset names. So the resulting code was something like:
set staging.mthbal_1 staging.mthbal_2 staging.mthbal_3;
Which tells SAS to read all three datasets in order.
Note that in this case PROC SQL is probably a little easier to understand.
proc sql noprint;
select cats('staging.mthbal_',default_mth)
into :list separated by ' '
from default_mth
;
quit;
data mthbal;
set &list;
run;
Tom, thanks a lot.
one more questions
the line:
retain out
what is out retaining as initial value, I dont see any value following or being specified in the 'retain out'
also what is the significance of Trim(out) as there is nothing before 'staging.mthbal_', is it to make sure no space is added in front of staging.mthbal?
Yes that data step is a mess.
The RETAIN statement does allow you to specify an OPTIONAL initial value. That is not used (or needed) in this code as the default of all blanks is fine.
The TRIM() is needed because the code is using simple catenation operator, || and not using modern code tools like the CATS() or CATX() function to concatenate. It must use the TRIM() because otherwise the complete value of OUT is used, including all of the trailing spaces that were inserted to fill out the full 5000 byte length of the variable. If you then tried to append some other characters without first removing the trailing spaces the result would be a string longer than the 5000 bytes that the OUT can store. So without the TRIM() assigning the result back to OUT variable will cause the adding strings to disappear since there is not room for more than 5000 bytes in the variable.
Another major issues with that code.
It is not adding any space before the value of 'staging.mthbal_'. That means it can only work if the input dataset only has ONE observation. But in that case there is no need to RETAIN the value of OUT or append values to the end of it. For this code to really work there needs to be at least one space before the S in the STAGING in that string literal: ' staging.mthbal_'
Without that extra space the result when there are multiple observations will not be usable as a "list" of dataset names since it will be one long string with no spaces between the names.
So this data step will only work when there is one observation in the input. Which means why bother with the RETAIN and appending the old value?
Other issues:
It is running CALL SYMPUT() to set the value of the macro variable LIST on every observation. Only the value generated by the call on the LAST observation will be available.
It is using the ancient CALL SYMPUT() function instead of the modern CALL SYMPUTX() function. That means that the value of LIST will be 5000 characrters long. Most of them the trailing spaces used to fill out the variable OUT. The CALL SYMPUTX() function automatically removes trailing (and leading) spaces from the string value passed when it creates the macro variable.
Hi,
Tom has already given you excellent answers.
As you are learning the DATA step, and tricky concepts like retain, it's helpful to play with it. In this case, you could change the second step from a data _NULL_ step to a step that writes a dataset, then you could PROC PRINT the output data step to see the impact of removing the remain. You could play with code like:
data default_mth ;
input default_mth $3.;
cards ;
Jan
Jul
Nov
;
run ;
data want ;
length out $5000.;
set default_mth;
retain out;
out=Trim(out)||'staging.mthbal_'||default_mth;
call symput("list",out);
run;
%put >>&=list<< ;
proc print data=want ;
run ;
data want ;
length out $5000.;
set default_mth;
*retain out; *Comment out the retain ;
out=Trim(out)||'staging.mthbal_'||default_mth;
call symput("list",out);
run;
%put >>&=list<< ;
proc print data=want ;
run ;
Other ways to investigate would be to add PUT statements to the code, or use the data step debugger.
Note neither of the above works correctly, as Tom pointed out. So you could keep playing with them, to add improvements until they work. (I agree with Tom's point that PROC SQL is more typical and clearer, but as a DATA step learning exercise, it's worth working this through).
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.