Creating a reference group from unexposed subjects and finding its odd...

ak2011 · Posted 08-13-2020 02:49 AM

Hello,

I would appreciate if someone could provide me with a better approach to solve this problem:

Creating a reference group(refgroup) from unexposed subjects(ids) and finding the odds ratio for the refgroup using logistic model

My main aims are thus are to

Count the number of cases(ca case) and controls (pop cont) not exposed to any of the the 4 agents (a1, a2,a3 and a4 ) from the dataset agents_expt (Table 1) below. Exposed is 1 and unexposed is zero (0).

Results: 3 ca cases and 7 pop cont obtained(Table 2). -Step 1 of SAS code

b.Create/name the results obtained in Table 2(ie. subjects unexposed to any of the agents as a refgroup for the purpose of comparison.-Step 2 of SAS code.

Please, I would need help here: SAS created 2 refgroups(0,1) which I think is incorrect. The refgroup should be for subjects unexposed to any agents only; i.e. only 1 refgroup should be created.

Find the estimate (odds ratio) for the refgroup.-step 3 of SAS code.

Specifically, I would like SAS to find the odds ratio for the refgroup (i.e. subjects unexposed to any of the agents) only (i.e.only 1 refgroup) using logistic regression.

Please, I would need the correct SAS code to solve the problem.

My dataset, code and log are found below; results are attached.

Thanks in advance for your expertise.

ak.



/* Logistic test ref group test*/
 data agents_expt;
input id$ a1 a2 a3 a4  lung$ 14-21 income 23-29;
datalines;
os1  1 0 0 1 ca case  45424
os2  1 1 0 0 ca case  52877
os3  0 0 0 0 pop cont 25600 
os4  1 0 0 1 pop cont 14888
os5  0 0 0 0 ca case  41036
os6  0 0 0 0 ca case  20365
os7  1 0 1 1 pop cont 16988
os8  0 0 0 0 ca case  100962
os9 1 0 1 0  pop cont 11230
os10 0 0 1 0 ca case  35850
os11 0 1 0 0 pop cont 28700
os12 0 0 0 0 pop cont 46320
os13 1 1 1 1 pop cont  24897
os14 0 0 0 0 pop cont  18966
os15 1 0 0 1 ca case  20540
os16 0 0 1 0 pop cont 150600
os17 1 1 1 1 pop cont  24897
os18 0 0 0 0 pop cont  17999
os19 0 0 0 0 pop cont  22540
os20 0 0 0 0 pop cont 158600
os21 0 0 0 0 pop cont 187365
os22 1 0 1 0 ca case  30580
;
run;


proc print data=agents_expt;
Title 'Table 1: Exposure of ids to 4 agents';

  /*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/  
proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
     tables lung;
title 'Table 2:Subjects unexposed to any of the 4 agents';
run;

/*Step 2:Using subjects unexposed to any of agents as a ref. group*/

proc sql;
create table t as 
  select
    id, a1, a2, a3,a4,lung, income,
    sum(a1,a2,a3,a4)=0 as refgroup
     from agents_expt
     ;
     quit;
     
   proc print data=t; 
   title 'Table 3: original variables and ref group';
   run;
   
    proc freq data=t;
    tables lung* refgroup;
    title 'Table 4: freq of ca case and pop cont for ref group';
    run;

/*Step 3: Finding odds ratio estimates for variables including ref.group*/ 

/* LOGISTIC REG. TEST*/
data logtest; set t;
if lung in ('ca case','pop cont');
run;

 proc logistic data=logtest;
model lung(event='ca case') =refgroup;  
Title 'Table 5: Estimates for ref. group';
run;

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73 /* Logistic test ref group test*/

74 data agents_expt;

75 input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;

76 datalines;

NOTE: The data set WORK.AGENTS_EXPT has 22 observations and 7 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

99 ;

100 run;

101

102

103 proc print data=agents_expt;

104 Title 'Table 1: Exposure of ids to 4 agents';

105

106 /*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/

NOTE: There were 22 observations read from the data set WORK.AGENTS_EXPT.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.48 seconds

cpu time 0.47 seconds

107 proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));

108 tables lung;

109 title 'Table 2:Subjects unexposed to any of the 4 agents';

110 run;

NOTE: There were 10 observations read from the data set WORK.AGENTS_EXPT.

WHERE SUM(a1, a2, a3, a4)=0;

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.18 seconds

cpu time 0.17 seconds

111

112 /*Step 2:Using subjects unexposed to any of agents as a ref. group*/

113

114 proc sql;

115 create table t as

116 select

117 id, a1, a2, a3,a4,lung, income,

118 sum(a1,a2,a3,a4)=0 as refgroup

119 from agents_expt

120 ;

NOTE: Table WORK.T created, with 22 rows and 8 columns.

121 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

122

123 proc print data=t;

124 title 'Table 3: original variables and ref group';

125 run;

NOTE: There were 22 observations read from the data set WORK.T.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.26 seconds

cpu time 0.24 seconds

126

127 proc freq data=t;

128 tables lung* refgroup;

129 title 'Table 4: freq of ca case and pop cont for ref group';

130 run;

NOTE: There were 22 observations read from the data set WORK.T.

NOTE: PROCEDURE FREQ used (Total process time):

real time 0.25 seconds

cpu time 0.23 seconds

131

132 /*Step 3: Finding odds ratio estimates for variables including ref.group*/

133

134 /* LOGISTIC REG. TEST*/

135 data logtest; set t;

136 if lung in ('ca case','pop cont');

137 run;

NOTE: There were 22 observations read from the data set WORK.T.

NOTE: The data set WORK.LOGTEST has 22 observations and 8 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

138

139 proc logistic data=logtest;

140 model lung(event='ca case') =refgroup;

141 Title 'Table 5: Estimates for ref. group';

142 run;

NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.

NOTE: Convergence criterion (GCONV=1E-8) satisfied.

NOTE: There were 22 observations read from the data set WORK.LOGTEST.

NOTE: PROCEDURE LOGISTIC used (Total process time):

real time 0.56 seconds

cpu time 0.51 seconds

143

144 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

156

SteveDenham · Posted 08-13-2020 07:29 AM

When you say the odds ratio for the refgroup, I don't understand. That is what PROC LOGISTIC is giving (well at least the log odds ratio). It is the ratio of the odds of being in ca_case given the observation is in refgroup=0 to the odds of being in ca_case given the observation is in refgroup=1. If you restrict the analysis to only the unexposed ids, you can calculate the odds of the response, but there is no other classification to use to calculate an odds ratio. Maybe I am just dense this morning and not catching on to what you want to do.

SteveDenham

ak2011 · Posted 08-13-2020 02:35 PM

Thank you Steve and sorry for the confusion. Fact is I would like to restrict the analysis to only the unexposed ids. Is there a way I can do it?
Thank you.
ak.

SteveDenham · Posted 08-13-2020 03:41 PM

Well, then perhaps running the analysis by refgroup can yield something. I have to remove refgroup from the model statement under this scenario. The log then looks like this:

79   proc logistic data=logtest;
80    by refgroup;
81   model lung(event='ca case') = /clodds=both;
82   Title 'Table 5: Estimates for ref. group';
83   run;

NOTE: No explanatory variables have been specified.
NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      refgroup=0
NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      refgroup=1
NOTE: There were 22 observations read from the data set WORK.LOGTEST.

I get an intercept for each of the models. For refgroup 0, it is -0.3365, for refgroup 1 it is -0.8473. Exponentiating these give 0.7124... for refgroup 0, and 0.42857...

for refgroup 2. These are just the ratios of cases to controls for each of the reference groups. No need for PROC LOGISTIC in this case, and perhaps just as important, there are no odds ratios within each refgroup. There are odds - not odds ratios. The intercept is on the log odds scale.

However, you have another variable in the dataset - income. You could calculate the odds ratio for some change in income. For this exercise, let's set that change at 1000. Then your code would look something like:

proc logistic data=logtest;
by refgroup;
model lung(event='ca case') = income /clodds=both; 
units income=1000;
run;

If you run this, you will find the OR in both refgroups is really close to 1, meaning that an increase or decrease in income of $1000 has essentially no effect on the incidence rate within a refgroup..

One last try. Now lets put both refgroups and income, as well as any interaction, in the model. Since one is continuous and the other categorical, there are some changes;

proc logistic data=logtest;
class refgroup;
model lung(event='ca case') =refgroup income refgroup*income ;  
oddsratio refgroup/at (income = 20000 to 200000 by 20000);
run;

And now you see an increase in the OR as income increases. However, the confidence bounds seem to grow even faster, thus reflecting the maximum likelihood tests that found no significant factors in the model.

SteveDenham

ak2011 · Posted 08-14-2020 03:32 PM

Thanks very much Steve for your in-depth explanation.
I have posted another question which looks similar to what you answered.
ak.

Creating a reference group from unexposed subjects and finding its odds ratio in logistic regression

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Creating a reference group from unexposed subjects and finding its odds ratio in logistic regression

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Re: Creating a reference group from unexposed subjects and finding its odds ratio in logistic regres

Ready to join fellow brilliant minds for the SAS Hackathon?