Creating a reference group and using it in a logistic regression model

ak2011 · Posted 08-07-2020 01:39 PM

Hello,

I would appreciate if someone could provide me with a better approach to solve the problem.

My main aims are

a. count the number of cases(ca case) and controls (pop cont) not exposed to any of the the 4 agents (a1, a2,a3 and a4 ) from the dataset agents_expt (Table 1) below. Exposed is 1 and unexposed is zero (0).

Results: 3 ca cases and 7 pop cont obtained(Table 2). -Step 1 of SAS code

b.Create/name the results obtained in Table 2(ie. subjects unexposed to any of the agents as a reference group(refgroup) for the purpose of comparison.-Step 2 of SAS code

c. Find the estimates (odds ratio) for the refgroup and other variables including income.-step 3 of SAS code.

Looks like my SAS code are too long to achieve the above aims.

I would appreciate if someone could provide me with a better approach to solve the problem.

*Pls note: For the purpose of this test approach, let us ignore the warning:quasi-separation of points(as there are inadequate data) . If the code works for this test data, I am sure it will work for the original dataset too.

My dataset, code and log are found below; results are attached.

Thanks in advance for your expertise.

ak.


/* Logistic test ref group test*/
 data agents_expt;
input id$ a1 a2 a3 a4  lung$ 14-21 income 23-29;
datalines;
os1  1 0 0 1 ca case  45424
os2  1 1 0 0 ca case  52877
os3  0 0 0 0 pop cont 25600 
os4  1 0 0 1 pop cont 14888
os5  0 0 0 0 ca case  41036
os6  0 0 0 0 ca case  20365
os7  1 0 1 1 pop cont 16988
os8  0 0 0 0 ca case  100962
os9 1 0 1 0  pop cont 11230
os10 0 0 1 0 ca case  35850
os11 0 1 0 0 pop cont 28700
os12 0 0 0 0 pop cont 46320
os13 1 1 1 1 pop cont  24897
os14 0 0 0 0 pop cont  18966
os15 1 0 0 1 ca case  20540
os16 0 0 1 0 pop cont 150600
os17 1 1 1 1 pop cont  24897
os18 0 0 0 0 pop cont  17999
os19 0 0 0 0 pop cont  22540
os20 0 0 0 0 pop cont 158600
os21 0 0 0 0 pop cont 187365
os22 1 0 1 0 ca case  30580
;
run;
proc print data=agents_expt;
Title 'Table 1: Exposure of ids to 4 agents';

  /*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/  
proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
     tables lung;
title 'Table 2:Subjects unexposed to any of the 4 agents';
run;

/*Step 2:Using subjects unexposed to any of agents as a ref. group*/

proc sql;
create table t as 
  select
    id, a1, a2, a3,a4,lung, income,
    sum(a1,a2,a3,a4)=0 as refgroup
     from agents_expt
     ;
     quit;
     
   proc print data=t; 
   title 'Table 3: original variables and ref group';
   run;
   
    proc freq data=t;
    tables lung* refgroup;
    title 'Table 4: freq of ca case and pop cont for ref group';
    run;

/*Step 3: Finding odds ratio estimates for variables including ref.group*/ 

/* LOGISTIC REG. TEST*/
data logtest; set t;/*P stands for Pooled*/
if lung in ('ca case','pop cont');
run;

 proc logistic data=logtest;
/*class cla_scat (param=ref ref ='0');*/
/*model lung(event='Ca case') = cla_expf age cigcsi;*/ 
model lung(event='ca case') =a1 a2 a3 a4 refgroup income; 
Title 'Table 5: Estimates for variables including ref. group';
run;

OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73

74 /* Logistic test ref group test*/

75 data agents_expt;

76 input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;

77 datalines;

NOTE: The data set WORK.AGENTS_EXPT has 22 observations and 7 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

100 ;

101 run;

102 proc print data=agents_expt;

103 Title 'Table 1: Exposure of ids to 4 agents';

104

105 /*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/

NOTE: There were 22 observations read from the data set WORK.AGENTS_EXPT.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.27 seconds

cpu time 0.27 seconds

106 proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));

107 tables lung;

108 title 'Table 2:Subjects unexposed to any of the 4 agents';

109 run;

NOTE: There were 10 observations read from the data set WORK.AGENTS_EXPT.

WHERE SUM(a1, a2, a3, a4)=0;

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.13 seconds

cpu time 0.11 seconds

110

111 /*Step 2:Using subjects unexposed to any of agents as a ref. group*/

112

113 proc sql;

114 create table t as

115 select

116 id, a1, a2, a3,a4,lung, income,

117 sum(a1,a2,a3,a4)=0 as refgroup

118 from agents_expt

119 ;

NOTE: Table WORK.T created, with 22 rows and 8 columns.

120 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

121

122 proc print data=t;

123 title 'Table 3: original variables and ref group';

124 run;

NOTE: There were 22 observations read from the data set WORK.T.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.22 seconds

cpu time 0.23 seconds

125

126 proc freq data=t;

127 tables lung* refgroup;

128 title 'Table 4: freq of ca case and pop cont for ref group';

129 run;

NOTE: There were 22 observations read from the data set WORK.T.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.15 seconds

cpu time 0.14 seconds

130

131 /*Step 3: Finding odds ratio estimates for variables including ref.group*/

132

133 /* LOGISTIC REG. TEST*/

134 data logtest; set t;/*P stands for Pooled*/

135 if lung in ('ca case','pop cont');

136 run;

NOTE: There were 22 observations read from the data set WORK.T.

NOTE: The data set WORK.LOGTEST has 22 observations and 8 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

137

138 proc logistic data=logtest;

139 /*class cla_scat (param=ref ref ='0');*/

140 /*model lung(event='Ca case') = cla_expf age cigcsi;*/

141 model lung(event='ca case') =a1 a2 a3 a4 refgroup income;

142 Title 'Table 5: Estimates for variables including ref. group';

143 run;

NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.

WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist.

WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood

iteration. Validity of the model fit is questionable.

NOTE: There were 22 observations read from the data set WORK.LOGTEST.

NOTE: PROCEDURE LOGISTIC used (Total process time):

real time 0.44 seconds

cpu time 0.41 seconds

144

145

146 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

158

SASJedi · Posted 10-10-2020 02:10 PM

Moved to the Statistical Procedures community, where it will likely get seen by folks with more expertise in this area.

Check out my Jedi SAS Tricks for SAS Users

SteveDenham · Posted 10-13-2020 09:32 AM

It runs without errors (but with the quasi-separation warning) on my machine. Since you are devising a reference group based on sum(a1-a4)=0, you may wish to try one of the following:

1. Remove a1 thru a4 as predictors This would give the OR for the levels of refgroup.

2. Use a BY refgroup statement and remove refgroup as a predictor. This would give the ORs for each refgroup separately.

I don't think you are going to be able to get accurate estimates if both refgroup and a1 thru a4 are included, primarily due to multicollinearity.

SteveDenham

ak2011 · Posted 10-24-2020 11:35 PM

Thank you Steve.
ak.

ak2011 · Posted 10-24-2020 11:35 PM

Thank you.

Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Re: Creating a reference group and using it in a logistic regression model

Ready to join fellow brilliant minds for the SAS Hackathon?