Hello
May anyone explain the code
avg_weight=symget(cats('avg_weight_',sex));
Why are we using cats here?
I don't see any concatenating here so why using cats?
IS it practical code or just in books?
PROC SQL noprint;
Select avg(case when sex='F' then weight else . end ),
avg(case when sex='M' then weight else . end )
into : avg_weight_m trimmed,
: avg_weight_f trimmed
from sashelp.class;
quit;
data class_way1;
set sashelp.class;
avg_weight=symget(cats('avg_weight_',sex));
Run;
data class_way2;
set sashelp.class;
IF sex='M' then avg_weight=&avg_weight_m.;
else IF sex='F' then avg_weight=&avg_weight_f.;
Run;
Of course there's concatenating done, the string 'avg_weight_' is concatenated with the contents of variable sex. CATS will automatically strip all arguments of leading and trailing blanks.
The first code shows how to replace a IF/THEN/ELSE with functions.
My code would be
proc sql;
create table class as
select
cl.*,
avg(weight) as average_weight
from sashelp.class cl
group by sex
order by name
;
quit;
Such code does not need to make any assumptions about the possible contents of sex.
Of course there's concatenating done, the string 'avg_weight_' is concatenated with the contents of variable sex. CATS will automatically strip all arguments of leading and trailing blanks.
The first code shows how to replace a IF/THEN/ELSE with functions.
My code would be
proc sql;
create table class as
select
cl.*,
avg(weight) as average_weight
from sashelp.class cl
group by sex
order by name
;
quit;
Such code does not need to make any assumptions about the possible contents of sex.
The first bit computes the average weight of each sex and stores the printable representation in a macro variable whose name contains the sex value. So you have
%put &=avg_weight_m;
%put &=avg_weight_f;
---- LOG ----
AVG_WEIGHT_M=90.11111
AVG_WEIGHT_F=108.95
Storing computation results in the macro symbol table is generally not the best approach.
Way 1 is just bad coding. Bad awful coding.
The result of
cats('avg_weight_', sex)
will be one of
'avg_weight_M'
'avg_weight_F'
which are the names of macro variables.
avg_weight = symget(cats('avg_weight_',sex));
moves the value of a macro variable to the value of a DATA step variable. However, avg_weight will be $200 because SYMGET returns a character value.
Way 2 is not much better.
IF sex='M' then avg_weight=&avg_weight_m.; else
IF sex='F' then avg_weight=&avg_weight_f.;
After macro resolution becomes
IF sex='M' then avg_weight=90.11111; else
IF sex='F' then avg_weight=108.95;
In this case the resolved macro variables are interpreted in the context of DATA Step source code, so the assigned variable ave_weight will take on a numeric type.
Typically you want the computation to be made and saved directly (per @Kurt_Bremser).
The function call method is more versatile (no literal coding of values needed), but will probably be less performant because of the repeated function calls to retrieve macro variable values, which might come into play when large datasets are processed.
My SQL is the simplest and easiest to understand code, but might be outperformed by a PROC MEANS and following DATA step with a hash object built from the means. This method avoids any conventional sorting and reads the datasets in sequential order.
Your example is poor because you stored numbers into macro variables and because the two methods actually produce different results (a character variable versus a numeric variable) so some respondents are concentrating on that.
Let just assume the macro variables already exist and concentrate on the final data step that is using the values of those macro variables. The reason the first method using SYMGET() is better than the second is because you do not need to know the number of possible values of the variable being used to generate the macro variable name (SEX in your example) when writing the code. To make it even more robust you could use the SYMEXIST() function to make sure the macro variable actually exists.
%let gender_f = Female;
data class_way1;
set sashelp.class;
length mvar $32 gender $6 ;
mvar=cats('gender_',sex);
if symexist(mvar) then gender=symget(mvar);
run;
proc freq ;
tables sex*gender / list missing;
run;
Cumulative Cumulative Sex gender Frequency Percent Frequency Percent ------------------------------------------------------------------ F Female 9 47.37 9 47.37 M 10 52.63 19 100.00
Usually, the only reason to create a data set with the mean by group as a variable is so you can subtract the mean from the actual value to get the difference between the actual value and the mean.
If that's the real reason for doing this, there's no need to create macro variables at all. PROC STDIZE does this subtraction in one step, and would be the simplest to code.
I agree with @RichardDeVen, the two solutions presented are very poor coding.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.