BookmarkSubscribeRSS Feed
Denali
Quartz | Level 8

Hi,

 

I have a matched case-control study with 1 case (lung cancer patient) matched with 2 controls on age, gender and race. I would like to conduct a conditional logistic regression to examine the association between smoking, family history of cancer, COPD and risk of lung cancer constrolling for the matching variables.

I would like to run univariate conditional logistic regression model for each of the above variable with the outcome of lung cancer (yes/no). 

Further, the multivariable conditional logistic regression model will be:  lung cancer (yes/no) = smoking + family history of cancer + COPD + age + gender + race

 

Do I need to create dummy variables for the categorical variables such as lung cancer (y/n) or smoking (y/n) in order to conduct a logistic regression? Can anyone please provide the example of SAS code?

 

7 REPLIES 7
Reeza
Super User

You do not need to create dummy variables, you put them in the CLASS statement of either PROC LOGISTIC or PHREG. Add the PARAM=REF option if you want referential (Dummy coding).

 

class <list of categorical variables> / param=REF;

Example 76.2 Logistic Modeling with Categorical Predictors

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm

Example 89.5 Conditional Logistic Regression for m:n Matching

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_phreg_examples05.htm

 


@Denali wrote:

Hi,

 

I have a matched case-control study with 1 case (lung cancer patient) matched with 2 controls on age, gender and race. I would like to conduct a conditional logistic regression to examine the association between smoking, family history of cancer, COPD and risk of lung cancer constrolling for the matching variables.

I would like to run univariate conditional logistic regression model for each of the above variable with the outcome of lung cancer (yes/no). 

Further, the multivariable conditional logistic regression model will be:  lung cancer (yes/no) = smoking + family history of cancer + COPD + age + gender + race

 

Do I need to create dummy variables for the categorical variables such as lung cancer (y/n) or smoking (y/n) in order to conduct a logistic regression? Can anyone please provide the example of SAS code?

 


 

Denali
Quartz | Level 8

@Reeza I was reviewing this article (https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_phreg_examples05.htm) as well, but I don't quite understand why there is a time variable and the author included  a line "time = 2-low".

 

data LBW;
   input id Age Low LWT Smoke HT UI @@;
   Time=2-Low;
   datalines;

 

proc phreg data=LBW;
   model Time*Low(0)= LWT Smoke HT UI / ties=discrete;
   strata Age;
run;

 

I do not have a variable that indicates which 2 controls are matched with the 1 case. Would it be a problem?

 

In my case, should I use the below code? What would be my "time" variable?

 

proc phreg data=test;

model Time*Lung Cancer(0)= smoking  family history of cancer  COPD   / ties=discrete;

strata age  gender race;

run;

 

Thank you!

Reeza
Super User

@Denali wrote:

@Reeza I was reviewing this article (https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_phreg_examples05.htm) as well, but I don't quite understand why there is a time variable and the author included  a line "time = 2-low".

 

In my case, should I use the below code? What would be my "time" variable?

 


You create the time variable, its not really a measure of time, just a constant to allow the model to be equivalent. 

 

The following is also in that example:

 

 The m:n matching refers to the situation in which there is a varying number of cases and controls in the matched sets. You can perform conditional logistic regression with the PHREG procedure by using the discrete logistic model and forming a stratum for each matched set. In addition, you need to create dummy survival times so that all the cases in a matched set have the same event time value, and the corresponding controls are censored at later times.

 

The variable Time is the response, and Low is the censoring variable. Note that the data set is created so that all the cases have the same event time and the controls have later censored times. 

 


@Denali wrote:

 

I do not have a variable that indicates which 2 controls are matched with the 1 case. Would it be a problem?

 


Yes, that is a problem - you'll need to add/create that variable to identify the case/control matches. If you used PSMATCH it typically includes that variable. 

Denali
Quartz | Level 8

@Reeza It seems like it might not be the code for my project because I have 1:2 matched design. 1 case always has 2 matched controls. 

 

Please correct me if I am wrong, but I think the other article that you provided was just for the regular (unconditional) logistic regression, not for the matched design - conditional logistic regression.

 

I am not sure which code I can use.

Reeza
Super User
You indicated you wanted to do univariate analysis and multivariate analysis, which is partly why I included the first link which is just logistic regression but it wouldn't account for the matching. I also know that most users will also run an analysis on the unmatched data to see what the differences are between the models but maybe that's just something we did internally (Cancer Clinical trials research).

Otherwise the PROC PHREG example should be all you need.
Denali
Quartz | Level 8

@Reeza I see. Thanks for sending me the first link. I just realized that Mantel-Haenszel test (categorical variable) and paired t-test (continuous variable) would be the univariate analyses with consideration of the matching design.  

 

If the phreg is the code that I need, what would be the time variable in my case? I am sorry that I did read through the article (and of course the paragraphs you copied and pasted), but I still don't know how to create dummy for my data.

Reeza
Super User
You manually create a time variable such that the following is true:
cases in a matched set have the same event time value, and the corresponding controls are censored at later times.
So you can set each control to 2 and each case to 1.

Again from the example: The variable Low is used to determine whether the subject is a case (Low=1, low-birth-weight baby) or a control (Low=0, normal-weight baby). The dummy time variable Time takes the value 1 for cases and 2 for controls. (2-Low)

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2419 views
  • 0 likes
  • 2 in conversation