Hi;
I have one exposure (exp), 4 dependants (dep) and 9 independants (indp) and i want to run proc genmod. the exposure will present with each independant to model each dependant.
For example modeling the first dependant:
proc genmod data=have;
model dep1 = exp indp1 /d=nb;run;
proc genmod data=have;
model dep1 = exp indp2 /d=nb;run;
.
.
.
proc genmod data=have;
model dep1 = exp indp9 /d=nb;run;
then the same for the other dependants.
i belive using macro will save the time.
Any hlep would be appreciated.
Hi @samnan,
Personally, I prefer analysis datasets in wide format (like your HAVE dataset) for statistical modelling procedures. All variables occurring in one MODEL statement must be available anyway in the input dataset.
With your existing dataset HAVE, your "macro" can be implemented using CALL EXECUTE:
data _null_;
array dep dep1 dep2 ...; /* list your dependent variables here */
array indp indp1 indp2 ...; /* list your independent variables here */
do i=1 to dim(dep);
do j=1 to dim(indp);
call execute( 'proc genmod data=have; '
|| 'model ' || vname(dep[i]) || ' = exp ' || vname(indp[j]) || ' /d=nb; run;');
end;
end;
run;
This assumes that all your independent variables are numeric (which is plausible as you didn't mention a CLASS statement). Please note that the arrays above do not contain variables from dataset HAVE, but just "dummy variables" with the same names.
Not too familiar with that procedure, however all procedures allow by group processing - which would be both easier to maintain and less resource hungry. Change your input data structure slightly - you can use transpose - normalised data is easier to work with:
You have:
... INDP1 INDP2 INDP3 ...
xyz xyz xyz
Change this to:
... INDP RES
... 1 xyz
... 2 xyz
Then you can do one gemod:
proc genmod data=have; by indp; model dep1 = exp res /d=nb; run;
Sorry, not sure how this affects the model, can only advise on structure and syntax.
Thanks, updated.
Dear @RW9 thanks for your valuable help, i do not have (res) variable. but the code provided by @FreelanceReinh works good.
RES is just a variable name I created, when the data is taken from going along the table, i.e. they are variables your data looks like this:
... INDP1 INDP2 INDP3 ...
... xyz def abc ...
...
When the data is normalised it would look like this:
... INDP RES (call these columns what you want)...
... 1 xyz...
... 2 def...
... 3 abc...
The second structure - exactly the same data, just in a different layout - is both easier to program with, uses core Base SA functionality of by group processing which is quicker and just generally better in all respsects. It is a useful thing to note that a slight restructure to your data can make your programming more efficient and easier to read and maintain.
Dear @RW9 thanks for your explanation, i will try it
Hi @samnan,
Personally, I prefer analysis datasets in wide format (like your HAVE dataset) for statistical modelling procedures. All variables occurring in one MODEL statement must be available anyway in the input dataset.
With your existing dataset HAVE, your "macro" can be implemented using CALL EXECUTE:
data _null_;
array dep dep1 dep2 ...; /* list your dependent variables here */
array indp indp1 indp2 ...; /* list your independent variables here */
do i=1 to dim(dep);
do j=1 to dim(indp);
call execute( 'proc genmod data=have; '
|| 'model ' || vname(dep[i]) || ' = exp ' || vname(indp[j]) || ' /d=nb; run;');
end;
end;
run;
This assumes that all your independent variables are numeric (which is plausible as you didn't mention a CLASS statement). Please note that the arrays above do not contain variables from dataset HAVE, but just "dummy variables" with the same names.
Dear @FreelanceReinh thanks for your appreciated help,
Can you take it one step further, where to let macro keep significant (indp) variables only. just to make it like (selection) option in logistic regression modeling.
So, you'd run the 4*9 PROC GENMOD steps (via CALL EXECUTE) and for each of the four dependent variables you would like to have a list of those independent variables which had p-values <0.05 in table "Analysis Of Maximum Likelihood Parameter Estimates," excluding variable EXP, which is always in the model?
Yes, this is possible. You could write the parameter estimates (incl. p-values) to datasets EST_DEP1, EST_DEP2, ... (where "DEPi" would be replaced by the name of the i-th dependent variable) and then, for example, select the names of the independent variables of interest via PROC SQL into macro variables INDPLIST_DEP1, INDPLIST_DEP2, ...
Here is draft code for this:
data _null_;
array dep dep1 dep2 ...; /* list your dependent variables here */
array indp indp1 indp2 ...; /* list your independent variables here */
do i=1 to dim(dep);
call execute('ods output ParameterEstimates(persist=proc)=est_' || vname(dep[i]) ||';');
do j=1 to dim(indp);
call execute( 'proc genmod data=have;'
|| 'model ' || vname(dep[i]) || ' = exp ' || vname(indp[j]) || ' /d=nb; run;');
end;
call execute('ods output close;');
call execute('proc sql noprint; select parameter into :indplist_' || vname(dep[i])
|| ' separated by " " from est_' || vname(dep[i])
|| ' where upcase(parameter) not in ("INTERCEPT", "EXP", "DISPERSION") & .<ProbChiSq<0.05; quit;');
end;
run;
%put &=indplist_dep1; /* replace dep1 by the name of the first dep. variable */
%put &=indplist_dep2; /* replace dep1 by the name of the second dep. variable */
...
You could use the variable lists &indplist_depi in MODEL statements of subsequent PROC GENMOD calls.
As I said, this is draft code. If for a particular dependent variable none of the 9 independent variables (excl. EXP) turned out to be significant, the corresponding macro variable would not be created (hence, the corresponding %PUT statement would cause a WARNING in the log).
i am not sure about last part (%put &=indplist_dep1; /* replace dep1 by the name of the first dep. variable */
).
i rename my dependant variable to (dep, dep1 ... dep7) and the indepdendant variables to (indp, indp1 .... indp16) just to apply the code. when i run it this time it showed the same results of old one.
The %PUT statements are just optional to demonstrate that the variable lists have been created.
Please note that my comments "list your (in)dependent variables here" referred to the lists dep1 dep2 ... and indp1 indp2 ..., respectively. The names dep and indp are the array names and must not be replaced. So, for instance, if your independent variables were AGE, HEIGHT, WEIGHT, the second array statement would read:
array indp age height weight;
and similarly for the first array.
If the first dependent variable was XYZ and only AGE and WEIGHT were significant for that, the suggested code would create a macro variable INDPLIST_XYZ containing age weight (possibly in upper or mixed case), selected from a WORK dataset named EST_XYZ.
Dear @FreelanceReinh you are doing very good coding. now i got it and i like it although the list contains all veriables and thier values either significant or insignificant.
What do you mean by "the list contains all veriables and thier values either significant or insignificant"? Are you saying that some or all of the macro variables INDPLIST_depi do not contain the correct variable lists? If anything does not work or is unclear, we can continue the discussion tomorrow (Central European Time).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.