Hello
I saw 2 ways to create a macro var with value of number of observations.
My question- What is the advantage of way 1 over way 2?
I see that both ways work but way2 is easier code.
What does it mean "if 0 "?
/***Way1***/
data _null_;
if 0 then set sashelp.class nobs=n;
call symput('num_obs',n);
stop;
run;
%put &num_obs;
/***Way2***/
data _null_;
set sashelp.class nobs=n;
call symput('nr_obs',n);
stop;
run;
%put &nr_obs;
"Way 2" will pull every observation and repeatedly assign the same value to the macro variable. If the data set is millions of observations it could take quite a while and waste lots of CPU. "Way 1" will take about the same amount of time regardless of the number of observations.
There is little practical advantage of Way 1 vs Way 2, other than clarity of code / purpose.
In this step you don't have to read any data from sashelp.class. When the SET statement compiles, it will assign the value to n. So the SET statement does not need to execute.
With Way 1, the code:
if 0 then set ... ;
is a way to prevent the SET statement from executing. It's a usual IF THEN statement. IF 0 is false, so the SET statement never executes.
It's common to use IF 0 then SET in a case where you don't want to read data from a dataset, but you do want metadata (e.g. variable names, or nobs in this case) the DATA step compiles.
In practice, Way 1 will not read any date from sashelp.class. Way 2 will read just one record from sashelp.class, because the STOP statement will end execution of the DATA step before the second record is read. So you won't see a noticeable difference in execution time between the two methods.
Still, Way 1 is generally preferable, as IF 0 THEN SET is a familiar recognizable coding pattern, that will make it easier for others to read your code.
Way 1 will not read an observation, way 2 will read the first. Since that obs is contained in the dataset header page, which has to be read physically anyway, the performance difference will be negligible.
So it's up to what is easier to understand for the next coder who comes across the program.
But for this, I'd prefer
proc sql noprint;
select nobs into :num_obs
from dictionary.tables
where libname = "SASHELP" and memname = "CLASS";
run;
Use symputx not symput. The former is designed to do automatic numeric to character conversions.
Since you have a STOP statement you don't need the IF/THEN. Just move the SET after the STOP.
data _null_;
call symputx('num_obs',n);
stop;
set sashelp.class nobs=n;
run;
%put &num_obs;
PS Do not use the ancient CALL SYMPUT() method unless you actually have a need to generate macro variables that contain leading and/or trailing spaces. It was replaced by CALL SYMPUTX() decades ago.
The first method (or the even simpler code shown by @Tom ) is the best. The second method will not execute the SYMPUT statement if the number of observations is 0, which may cause problems later.
And yes, as others have already suggested, use SYMPUTX, not SYMPUT. In the actual example, the difference will be that the SYMPUT-ed variable will contain trailing blanks, the other will not.
One additional thing to consider: the DATA step methods will fail with an ERROR if the dataset does not exist, the SQL query will simply not return a value. Depending on context, the SQL method might be an easier way to deal gracefully with a missing dataset (particularly important for batch jobs which should continue executing).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.