BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
samnan
Quartz | Level 8

Hi;

 

I have one exposure (exp), 4 dependants (dep) and 9 independants (indp) and i want to run proc genmod. the exposure will present with each independant to model each dependant.  

 

For example modeling the first dependant:

 

proc genmod data=have;

model dep1 = exp indp1 /d=nb;run;

 

proc genmod data=have;

model dep1 = exp indp2 /d=nb;run;

.

.

.

proc genmod data=have;

model dep1 = exp indp9 /d=nb;run;

 

then the same for the other dependants.

 

i belive using macro will save the time.

 

Any hlep would be appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @samnan,

 

Personally, I prefer analysis datasets in wide format (like your HAVE dataset) for statistical modelling procedures. All variables occurring in one MODEL statement must be available anyway in the input dataset.


With your existing dataset HAVE, your "macro" can be implemented using CALL EXECUTE:

data _null_;
array dep dep1 dep2 ...; /* list your dependent variables here */
array indp indp1 indp2 ...; /* list your independent variables here */
do i=1 to dim(dep);
  do j=1 to dim(indp);
    call execute( 'proc genmod data=have; '
               || 'model ' || vname(dep[i]) || ' = exp ' || vname(indp[j]) || ' /d=nb; run;');
  end;
end;
run;

This assumes that all your independent variables are numeric (which is plausible as you didn't mention a CLASS statement). Please note that the arrays above do not contain variables from dataset HAVE, but just "dummy variables" with the same names.

View solution in original post

21 REPLIES 21
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not too familiar with that procedure, however all procedures allow by group processing - which would be both easier to maintain and less resource hungry.  Change your input data structure slightly - you can use transpose - normalised data is easier to work with:

You have:

... INDP1  INDP2 INDP3 ...

    xyz       xyz      xyz

 

Change this to:

... INDP   RES

...  1        xyz

...   2       xyz

 

Then you can do one gemod:

proc genmod data=have;
  by indp;
  model dep1 = exp res /d=nb;
run;

Sorry, not sure how this affects the model, can only advise on structure and syntax.

FreelanceReinh
Jade | Level 19

Hi @RW9,

 

In your MODEL statement, indp should read res to make sense.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Thanks, updated.

samnan
Quartz | Level 8

Dear @RW9 thanks for your valuable help, i do not have (res) variable. but the code provided by @FreelanceReinh works good.

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

RES is just a variable name I created, when the data is taken from going along the table, i.e. they are variables your data looks like this:

... INDP1   INDP2   INDP3 ...

... xyz        def       abc     ...

...

 

When the data is normalised it would look like this:

...  INDP  RES (call these columns what you want)...

...  1        xyz...

...  2        def...

...  3        abc...

 

The second structure - exactly the same data, just in a different layout - is both easier to program with, uses core Base SA functionality of by group processing which is quicker and just generally better in all respsects.  It is a useful thing to note that a slight restructure to your data can make your programming more efficient and easier to read and maintain.

 

samnan
Quartz | Level 8

Dear @RW9 thanks for your explanation, i will try it

FreelanceReinh
Jade | Level 19

Hi @samnan,

 

Personally, I prefer analysis datasets in wide format (like your HAVE dataset) for statistical modelling procedures. All variables occurring in one MODEL statement must be available anyway in the input dataset.


With your existing dataset HAVE, your "macro" can be implemented using CALL EXECUTE:

data _null_;
array dep dep1 dep2 ...; /* list your dependent variables here */
array indp indp1 indp2 ...; /* list your independent variables here */
do i=1 to dim(dep);
  do j=1 to dim(indp);
    call execute( 'proc genmod data=have; '
               || 'model ' || vname(dep[i]) || ' = exp ' || vname(indp[j]) || ' /d=nb; run;');
  end;
end;
run;

This assumes that all your independent variables are numeric (which is plausible as you didn't mention a CLASS statement). Please note that the arrays above do not contain variables from dataset HAVE, but just "dummy variables" with the same names.

samnan
Quartz | Level 8

Dear @FreelanceReinh thanks for your appreciated help,

samnan
Quartz | Level 8

Can you take it one step further, where to let macro keep significant (indp) variables only. just to make it like (selection) option in logistic regression modeling.

FreelanceReinh
Jade | Level 19

So, you'd run the 4*9 PROC GENMOD steps (via CALL EXECUTE) and for each of the four dependent variables you would like to have a list of those independent variables which had p-values <0.05 in table "Analysis Of Maximum Likelihood Parameter Estimates," excluding variable EXP, which is always in the model?

 

Yes, this is possible. You could write the parameter estimates (incl. p-values) to datasets EST_DEP1, EST_DEP2, ... (where "DEPi" would be replaced by the name of the i-th dependent variable) and then, for example, select the names of the independent variables of interest via PROC SQL into macro variables INDPLIST_DEP1, INDPLIST_DEP2, ...

 

Here is draft code for this:

data _null_;
array dep dep1 dep2 ...; /* list your dependent variables here */
array indp indp1 indp2 ...; /* list your independent variables here */
do i=1 to dim(dep);
  call execute('ods output ParameterEstimates(persist=proc)=est_' || vname(dep[i]) ||';');
  do j=1 to dim(indp);
    call execute( 'proc genmod data=have;'
               || 'model ' || vname(dep[i]) || ' = exp ' || vname(indp[j]) || ' /d=nb; run;');
  end;
  call execute('ods output close;');
  call execute('proc sql noprint; select parameter into :indplist_' || vname(dep[i])
            || ' separated by " " from est_' || vname(dep[i])
            || ' where upcase(parameter) not in ("INTERCEPT", "EXP", "DISPERSION") & .<ProbChiSq<0.05; quit;');
end;
run;

%put &=indplist_dep1; /* replace dep1 by the name of the first dep. variable */
%put &=indplist_dep2; /* replace dep1 by the name of the second dep. variable */
...

You could use the variable lists &indplist_depi in MODEL statements of subsequent PROC GENMOD calls.

 

As I said, this is draft code. If for a particular dependent variable none of the 9 independent variables (excl. EXP) turned out to be significant, the corresponding macro variable would not be created (hence, the corresponding %PUT statement would cause a WARNING in the log).

samnan
Quartz | Level 8

i am not sure about last part (%put &=indplist_dep1; /* replace dep1 by the name of the first dep. variable */
).

i rename my dependant variable to (dep, dep1 ... dep7) and the indepdendant variables to (indp, indp1 .... indp16) just to apply the code. when i run it this time it showed the same results of old one.

FreelanceReinh
Jade | Level 19

The %PUT statements are just optional to demonstrate that the variable lists have been created.

 

Please note that my comments "list your (in)dependent variables here" referred to the lists dep1 dep2 ... and indp1 indp2 ..., respectively. The names dep and indp are the array names and must not be replaced. So, for instance, if your independent variables were AGE, HEIGHT, WEIGHT, the second array statement would read:

array indp age height weight;

and similarly for the first array.

 

If the first dependent variable was XYZ and only AGE and WEIGHT were significant for that, the suggested code would create a macro variable INDPLIST_XYZ containing age weight (possibly in upper or mixed case), selected from a WORK dataset named EST_XYZ.

samnan
Quartz | Level 8

Dear @FreelanceReinh you are doing very good coding. now i got it and i like it although the list contains all veriables and thier values either significant or insignificant.

FreelanceReinh
Jade | Level 19

What do you mean by "the list contains all veriables and thier values either significant or insignificant"? Are you saying that some or all of the macro variables INDPLIST_depi do not contain the correct variable lists? If anything does not work or is unclear, we can continue the discussion tomorrow (Central European Time).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 2522 views
  • 4 likes
  • 3 in conversation