BookmarkSubscribeRSS Feed
ak2011
Fluorite | Level 6

Hello,

I would appreciate if someone could provide me with a better approach to solve this problem:

Creating a reference group(refgroup) from unexposed subjects(ids) and finding the odds ratio for the refgroup using logistic model

 

My main aims are thus are to

  1. Count the number of cases(ca case) and controls (pop cont) not exposed to any of the the 4 agents (a1, a2,a3 and a4 ) from the dataset agents_expt (Table 1) below. Exposed is 1 and unexposed is zero (0). 

Results: 3 ca cases and 7 pop cont obtained(Table 2). -Step 1 of SAS code

 

b.Create/name the results obtained in Table 2(ie. subjects unexposed to any of the agents as a refgroup for the purpose of comparison.-Step 2 of SAS code.

 

Please, I would need help here: SAS created 2 refgroups(0,1) which I think is incorrect. The refgroup should be for subjects unexposed to any agents only; i.e. only 1 refgroup should be created.

 

  1. Find the estimate (odds ratio) for the refgroup.-step 3 of SAS code.

Specifically, I would like SAS to find the odds ratio for the refgroup (i.e. subjects unexposed to any of the agents) only (i.e.only 1 refgroup) using logistic regression.

 

Please, I would need the correct SAS code to solve the problem.

 

 My dataset, code and log are found below; results are attached.

 

Thanks in advance for your expertise.

ak.

 



/* Logistic test ref group test*/
data agents_expt;
input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;
datalines;
os1 1 0 0 1 ca case 45424
os2 1 1 0 0 ca case 52877
os3 0 0 0 0 pop cont 25600
os4 1 0 0 1 pop cont 14888
os5 0 0 0 0 ca case 41036
os6 0 0 0 0 ca case 20365
os7 1 0 1 1 pop cont 16988
os8 0 0 0 0 ca case 100962
os9 1 0 1 0 pop cont 11230
os10 0 0 1 0 ca case 35850
os11 0 1 0 0 pop cont 28700
os12 0 0 0 0 pop cont 46320
os13 1 1 1 1 pop cont 24897
os14 0 0 0 0 pop cont 18966
os15 1 0 0 1 ca case 20540
os16 0 0 1 0 pop cont 150600
os17 1 1 1 1 pop cont 24897
os18 0 0 0 0 pop cont 17999
os19 0 0 0 0 pop cont 22540
os20 0 0 0 0 pop cont 158600
os21 0 0 0 0 pop cont 187365
os22 1 0 1 0 ca case 30580
;
run;


proc print data=agents_expt;
Title 'Table 1: Exposure of ids to 4 agents';

/*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/
proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
tables lung;
title 'Table 2:Subjects unexposed to any of the 4 agents';
run;

/*Step 2:Using subjects unexposed to any of agents as a ref. group*/

proc sql;
create table t as
select
id, a1, a2, a3,a4,lung, income,
sum(a1,a2,a3,a4)=0 as refgroup
from agents_expt
;
quit;

proc print data=t;
title 'Table 3: original variables and ref group';
run;

proc freq data=t;
tables lung* refgroup;
title 'Table 4: freq of ca case and pop cont for ref group';
run;

/*Step 3: Finding odds ratio estimates for variables including ref.group*/

/* LOGISTIC REG. TEST*/
data logtest; set t;
if lung in ('ca case','pop cont');
run;

proc logistic data=logtest;
model lung(event='ca case') =refgroup;
Title 'Table 5: Estimates for ref. group';
run;

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 /* Logistic test ref group test*/
74 data agents_expt;
75 input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;
76 datalines;
 
NOTE: The data set WORK.AGENTS_EXPT has 22 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
99 ;
100 run;
101
102
103 proc print data=agents_expt;
104 Title 'Table 1: Exposure of ids to 4 agents';
105
106 /*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/
 
NOTE: There were 22 observations read from the data set WORK.AGENTS_EXPT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.48 seconds
cpu time 0.47 seconds
 
 
107 proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
108 tables lung;
109 title 'Table 2:Subjects unexposed to any of the 4 agents';
110 run;
 
NOTE: There were 10 observations read from the data set WORK.AGENTS_EXPT.
WHERE SUM(a1, a2, a3, a4)=0;
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.18 seconds
cpu time 0.17 seconds
 
 
111
112 /*Step 2:Using subjects unexposed to any of agents as a ref. group*/
113
114 proc sql;
115 create table t as
116 select
117 id, a1, a2, a3,a4,lung, income,
118 sum(a1,a2,a3,a4)=0 as refgroup
119 from agents_expt
120 ;
NOTE: Table WORK.T created, with 22 rows and 8 columns.
 
121 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 
 
122
123 proc print data=t;
124 title 'Table 3: original variables and ref group';
125 run;
 
NOTE: There were 22 observations read from the data set WORK.T.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.26 seconds
cpu time 0.24 seconds
 
 
126
127 proc freq data=t;
128 tables lung* refgroup;
129 title 'Table 4: freq of ca case and pop cont for ref group';
130 run;
 
NOTE: There were 22 observations read from the data set WORK.T.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.25 seconds
cpu time 0.23 seconds
 
 
131
132 /*Step 3: Finding odds ratio estimates for variables including ref.group*/
133
134 /* LOGISTIC REG. TEST*/
135 data logtest; set t;
136 if lung in ('ca case','pop cont');
137 run;
 
NOTE: There were 22 observations read from the data set WORK.T.
NOTE: The data set WORK.LOGTEST has 22 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
138
139 proc logistic data=logtest;
140 model lung(event='ca case') =refgroup;
141 Title 'Table 5: Estimates for ref. group';
142 run;
 
NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 22 observations read from the data set WORK.LOGTEST.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.56 seconds
cpu time 0.51 seconds
 
 
143
144 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
156

 








 

4 REPLIES 4
SteveDenham
Jade | Level 19

When you say the odds ratio for the refgroup, I don't understand.  That is what PROC LOGISTIC is giving (well at least the log odds ratio).  It is the ratio of the odds of being in ca_case given the observation is in refgroup=0 to the odds of being in ca_case given the observation is in refgroup=1. If you restrict the analysis to only the unexposed ids, you can calculate the odds of the response, but there is no other classification to use to calculate an odds ratio.  Maybe I am just dense this morning and not catching on to what you want to do. 

 

SteveDenham

ak2011
Fluorite | Level 6
Thank you Steve and sorry for the confusion. Fact is I would like to restrict the analysis to only the unexposed ids. Is there a way I can do it?
Thank you.
ak.
SteveDenham
Jade | Level 19

Well, then perhaps running the analysis by refgroup can yield something.  I have to remove refgroup from the model statement under this scenario. The log then looks like this:

 

79   proc logistic data=logtest;
80    by refgroup;
81   model lung(event='ca case') = /clodds=both;
82   Title 'Table 5: Estimates for ref. group';
83   run;

NOTE: No explanatory variables have been specified.
NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      refgroup=0
NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: The above message was for the following BY group:
      refgroup=1
NOTE: There were 22 observations read from the data set WORK.LOGTEST.

I get an intercept for each of the models.  For refgroup 0, it is -0.3365, for refgroup 1 it is -0.8473.  Exponentiating these give 0.7124... for refgroup 0, and 0.42857...

for refgroup 2.  These are just the ratios of cases to controls for each of the reference groups.  No need for PROC LOGISTIC in this case, and perhaps just as important, there are no odds ratios within each refgroup.  There are odds - not odds ratios.  The intercept is on the log odds scale.

 

However, you have another variable in the dataset - income.  You could calculate the odds ratio for some change in income.  For this exercise, let's set that change at 1000.  Then your code would look something like:

 

proc logistic data=logtest;
by refgroup;
model lung(event='ca case') = income /clodds=both;
units income=1000;
run;

 If you run this, you will find the OR in both refgroups is really close to 1, meaning that an increase or decrease in income of $1000 has essentially no effect on the incidence rate within a refgroup..

 

One last try. Now lets put both refgroups and income, as well as any interaction, in the model.  Since one is continuous and the other categorical, there are some changes;

 

proc logistic data=logtest;
class refgroup;
model lung(event='ca case') =refgroup income refgroup*income ;  
oddsratio refgroup/at (income = 20000 to 200000 by 20000);
run;

 And now you see an increase in the OR as income increases.  However, the confidence bounds seem to grow even faster, thus reflecting the maximum likelihood tests that found no significant factors in the model.

 

SteveDenham 

 

ak2011
Fluorite | Level 6
Thanks very much Steve for your in-depth explanation.
I have posted another question which looks similar to what you answered.
ak.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 553 views
  • 2 likes
  • 2 in conversation