Help using Base SAS procedures


Not applicable
Posts: 0


Hi All,

Could you help me understand how are the variables pctdept & pcttot created:-

data new;
set dusty;

Kind Regards,
Posts: 9,367

Re: Symget

Posted in reply to deleted_user
It is very hard without knowing how the macro variable &S_TOT is created and whether the dataset WORK.DUSTY has a variable called DEPT. Your program seems to be a small piece of a bigger process. The macro facility is involved -- you have 2 forms of macro variable referencing -- the &S_TOT value will be resolved at code compile time. It must be something like the grand total of all the salaries across all the departments.

Let's say that WORK.DUSTY has values for DEPT like ACCT or MIS, then that would mean the macro variables &SACCT or &SMIS would resolve at data step execution time, because of the SYMGET, based on the current value of DEPT being read. Whatever value that was stored in the macro variable &SACCT or &SMIS would get used in the division. I would guess that these macro variables store some number that is appropriate at the department level. Without knowing more about the data that is stored in the SALARY variable on every obs, it is hard to comment on whether the result of the division will be the correct percent you want. Again, we come back to what is in WORK.DUSTY, how was it created and how were the macro variables created and what is the purpose and desired output of this program.

One thing you can do to figure out what's happening is to turn on all the possible debugging techniques for revealing macro processing:
options mprint symbolgen mlogic;

and run the program, look in the log and see if you can trace back from this program to the place or program where the macro variables are created. The documentation on SYMGET is here:

These papers provide a good introduction to the Macro facility:

Valued Guide
Posts: 2,191

Re: Symget

Posted in reply to deleted_user
looks like part of a process trying to determine the % significance of an observation of "salary" within its "department" and overall.
This step would be preceded by a proc summary to collect department and overall totals
proc summary data= dusty missing ;
class dept ;
var salary ;
output sum= ;
run ;
Then a data step generates macro variables with names based on the value in DEPT and holding that dept total for salary.
data _null_ ;
set ;
if _type_ then call symputx( 's' !! dept, salary );
else call symputx( 's_tot', salary ) ;
Of course this three stage exercise is hard work compared with PROC TABULATE, REPORT or even SQL!
Because not only is it three steps but also, that third step is interacting with the macro environment through the SYMGET() function at each observation in DUSTY, and this interaction causes performance penalties when scaling up the size of DUSTY.
And I don't think the model scales up to more than one CLASS variable.

of course, this is only my guesswork!

would you like to explain what is the purpose of this step?
Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation