Hi all!
I have a confussion to perform GLM with a macro.
I have this:
A | B1 | B2 | B3 | B4 |
0.1 | 0 | 2 | 1 | 1 |
0.9 | 1 | 1 | 0 | 2 |
0.78 | 2 | 1 | 1 | 1 |
0.64 | 0 | 2 | 0 | 0 |
and I need to perform several regression over variable the A. For example: A ~B1, A~B2, A~B3 etc.... (I have ~300 of variables).
Later, I need all the p-values of each association to evaluate if one of the associations is true.
But I don't have clue how to do it...
I really appreciate your help, as always!
Thanks in advance!
I think you need to add a CLASS statement to your PROC GLM, since your variable named value is CLASS and not continuous.
If you want the overall p-value for the model, you also need the ODS OUTPUT statement like this:
ods output overallanova=pe;
proc glm data=try1 noprint;
by VarName;
class value;
model south_ref = Value;
quit;
All explained here: https://blogs.sas.com/content/iml/2017/02/13/run-1000-regressions.html
No macros needed.
Also, in my opinion, not a good idea to perform regressions like this, hundreds or thousands of variables individually.
Maybe, I didn't explain very well, but my variable X is categorical (just 3 values are possible) and Y variable is continuous.
So, I think is not possible to perform that code, or am I wrong?
Use PROC GLM instead of PROC REG.
Show us the code you tried.
I'm trying with something simple like this:
proc glm data=exp;
class b1;
model a=b1;
run;
but I need to incorporate a macro because I have B1-B300 and also modified to retain just the p-values for each association.
You don't need a macro. The link I gave explains how to do this without a macro. Scroll down to the section entitled "The BY way for many models". Give that a try. If you get stuck, show us your code.
I modified the code but the problem now, is that the code don't give the p-value for each association evaluated. What is coded as "value" in proc print is not p-value of the association.
data try1;
set exp; /* <== specify data set name HERE */
array new_SNP [*] new_SNP1 - new_SNP280; /* <== specify explanatory variables HERE */
do varNum = 1 to dim(new_SNP);
VarName = vname(new_SNP[varNum]); /* variable name in char var */
Value = new_SNP[varNum]; /* value for each variable for each obs */
output;
end;
drop new_SNP:;
run;
/* 2. Sort by BY-group variable */
proc sort data=try1; by VarName; run;
/* 3. Call PROC REG and use BY statement to compute all regressions */
proc glm data=try1 noprint outest=PE;
by VarName;
model south_ref = Value;
quit;
/* Look at the results */
proc print data=PE(obs=5);
var VarName Intercept Value;
run;
I think you need to add a CLASS statement to your PROC GLM, since your variable named value is CLASS and not continuous.
If you want the overall p-value for the model, you also need the ODS OUTPUT statement like this:
ods output overallanova=pe;
proc glm data=try1 noprint;
by VarName;
class value;
model south_ref = Value;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.