BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lauldani
Fluorite | Level 6

I am running a logistic regression on 1714 variables (PheWAS). I followed this guide (https://blogs.sas.com/content/iml/2017/02/13/run-1000-regressions.html) to run the regression the "by way."

In my final table, I would like to have the number of cases for each predictor (the predictor/exposure is a SNP (genetic variant) yes/no). In my final logistic table I have removed the reference row.  Each row is one logistic regression and unique on varname.

Table that I get

Varnamep-valueodds ratio
_001.00210.2
_002.61

 

the table that I want

Varnamep-valueodds ratiocases_SNP_yescases_SNP_no
_0010.00210.21005
_0020.61.03030

The way I currently get cases is to run a proc means step on the input data set (one row per patient (obs=264,000), one column per variable, and a column that indicates exposure) and then merge it with the logistic output by varname. I then repeat the step to get the number of cases for the other predictor. However, this takes a long time and I would think there is a better way to do this. I am wondering if there is an option statement in the proc logistic statement.

 

Sample code is below

 

 

 

 

 

 

 

 

* code for how I get my logistic table;
proc logistic data = have / alpha=0.00002927;
	by VarName; *this is the "by way" ;
	class SNP ;
	model value = SNP / rsq expb; 
	ods output ParameterEstimates=model ;
quit;

data model_formated;
	set model (rename=(expest=odds_ratio));
	where variable = 'SNP'; *keep the row that contain the p value
run;

proc means data=have sum;
by varname ;
where SNP=1;
var value;
output out=cases
sum=count;
run;

data logistic_with_counts;
	merge model_formated cases(keep=varname count);
	by varname;
run;

 

 

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ
You should always specify the EVENT= response option to be sure you are modeling the probability of the event level and not the nonevent level. For example: model value(event="Yes") = ... . The number of cases (events) and nonevents is in the Response Profile table that is automatically displayed. You can save it by also saving the ResponseProfile table in your ODS OUTPUT statement.

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

In my final table, I would like to have the number of cases for each class (exposure is one of two drugs - studyrx is the column name). In my final logistic table, I have removed the reference row.  Each row is one logistic regression and unique on varname.

 

Varname p-value odds ratio
_001 .002 10.2
_002 .6 1

 

I ask for clarification here. What do you mean by "number of cases"? What do you mean by "each class"? Can you show us the table you would like, even if the numbers are fake and explain wehre the real numbers come from?

 

As far as the overall problem that it takes too long is concerned, please tell me, what are you going to do with these 1714 logistic regression results once you have them. There may be smarter ways to do this, rather than ways to speed up the time it takes to do 1714 regressions.

--
Paige Miller
lauldani
Fluorite | Level 6

I am trying to run a PhewAS (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666492/) in SAS; for reasons, I can't run it in R. Therefore, the multiple regression is the procedure.

 

Each variable has a response 1=yes and 0=no. The number of cases is the number of "yeses". I want to know the number of "yeses" broken down by each predictor. I don't need to know this, but displaying this information is the standard.

PaigeMiller
Diamond | Level 26

Number of cases can be computed via PROC FREQ and then added into the PROC LOGISTIC output.

 

With >1700 variables, the logistic regressions should take a while, and I am not aware of a method to speed this up, as you are using the fastest method I know of.

--
Paige Miller
StatDave
SAS Super FREQ
You should always specify the EVENT= response option to be sure you are modeling the probability of the event level and not the nonevent level. For example: model value(event="Yes") = ... . The number of cases (events) and nonevents is in the Response Profile table that is automatically displayed. You can save it by also saving the ResponseProfile table in your ODS OUTPUT statement.
lauldani
Fluorite | Level 6

That is exactly what I was looking for. Thank you.

Reeza
Super User
Consider adding the SIMPLE option to your PROC LOGISITIC and then capture it in an ODS statement as well.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1064 views
  • 3 likes
  • 4 in conversation