I am running a logistic regression on 1714 variables (PheWAS). I followed this guide (https://blogs.sas.com/content/iml/2017/02/13/run-1000-regressions.html) to run the regression the "by way."
In my final table, I would like to have the number of cases for each predictor (the predictor/exposure is a SNP (genetic variant) yes/no). In my final logistic table I have removed the reference row. Each row is one logistic regression and unique on varname.
Table that I get
Varname | p-value | odds ratio |
_001 | .002 | 10.2 |
_002 | .6 | 1 |
the table that I want
Varname | p-value | odds ratio | cases_SNP_yes | cases_SNP_no |
_001 | 0.002 | 10.2 | 100 | 5 |
_002 | 0.6 | 1.0 | 30 | 30 |
The way I currently get cases is to run a proc means step on the input data set (one row per patient (obs=264,000), one column per variable, and a column that indicates exposure) and then merge it with the logistic output by varname. I then repeat the step to get the number of cases for the other predictor. However, this takes a long time and I would think there is a better way to do this. I am wondering if there is an option statement in the proc logistic statement.
Sample code is below
* code for how I get my logistic table;
proc logistic data = have / alpha=0.00002927;
by VarName; *this is the "by way" ;
class SNP ;
model value = SNP / rsq expb;
ods output ParameterEstimates=model ;
quit;
data model_formated;
set model (rename=(expest=odds_ratio));
where variable = 'SNP'; *keep the row that contain the p value
run;
proc means data=have sum;
by varname ;
where SNP=1;
var value;
output out=cases
sum=count;
run;
data logistic_with_counts;
merge model_formated cases(keep=varname count);
by varname;
run;
In my final table, I would like to have the number of cases for each class (exposure is one of two drugs - studyrx is the column name). In my final logistic table, I have removed the reference row. Each row is one logistic regression and unique on varname.
Varname p-value odds ratio _001 .002 10.2 _002 .6 1
I ask for clarification here. What do you mean by "number of cases"? What do you mean by "each class"? Can you show us the table you would like, even if the numbers are fake and explain wehre the real numbers come from?
As far as the overall problem that it takes too long is concerned, please tell me, what are you going to do with these 1714 logistic regression results once you have them. There may be smarter ways to do this, rather than ways to speed up the time it takes to do 1714 regressions.
I am trying to run a PhewAS (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666492/) in SAS; for reasons, I can't run it in R. Therefore, the multiple regression is the procedure.
Each variable has a response 1=yes and 0=no. The number of cases is the number of "yeses". I want to know the number of "yeses" broken down by each predictor. I don't need to know this, but displaying this information is the standard.
Number of cases can be computed via PROC FREQ and then added into the PROC LOGISTIC output.
With >1700 variables, the logistic regressions should take a while, and I am not aware of a method to speed this up, as you are using the fastest method I know of.
That is exactly what I was looking for. Thank you.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.