Hello,
I would appreciate if someone could provide me with a better approach to solve this problem:
Creating a reference group(refgroup) from unexposed subjects(ids) and finding the odds ratio for the refgroup using logistic model
My main aims are thus are to
Results: 3 ca cases and 7 pop cont obtained(Table 2). -Step 1 of SAS code
b.Create/name the results obtained in Table 2(ie. subjects unexposed to any of the agents as a refgroup for the purpose of comparison.-Step 2 of SAS code.
Please, I would need help here: SAS created 2 refgroups(0,1) which I think is incorrect. The refgroup should be for subjects unexposed to any agents only; i.e. only 1 refgroup should be created.
Specifically, I would like SAS to find the odds ratio for the refgroup (i.e. subjects unexposed to any of the agents) only (i.e.only 1 refgroup) using logistic regression.
Please, I would need the correct SAS code to solve the problem.
My dataset, code and log are found below; results are attached.
Thanks in advance for your expertise.
ak.
/* Logistic test ref group test*/
data agents_expt;
input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;
datalines;
os1 1 0 0 1 ca case 45424
os2 1 1 0 0 ca case 52877
os3 0 0 0 0 pop cont 25600
os4 1 0 0 1 pop cont 14888
os5 0 0 0 0 ca case 41036
os6 0 0 0 0 ca case 20365
os7 1 0 1 1 pop cont 16988
os8 0 0 0 0 ca case 100962
os9 1 0 1 0 pop cont 11230
os10 0 0 1 0 ca case 35850
os11 0 1 0 0 pop cont 28700
os12 0 0 0 0 pop cont 46320
os13 1 1 1 1 pop cont 24897
os14 0 0 0 0 pop cont 18966
os15 1 0 0 1 ca case 20540
os16 0 0 1 0 pop cont 150600
os17 1 1 1 1 pop cont 24897
os18 0 0 0 0 pop cont 17999
os19 0 0 0 0 pop cont 22540
os20 0 0 0 0 pop cont 158600
os21 0 0 0 0 pop cont 187365
os22 1 0 1 0 ca case 30580
;
run;
proc print data=agents_expt;
Title 'Table 1: Exposure of ids to 4 agents';
/*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/
proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
tables lung;
title 'Table 2:Subjects unexposed to any of the 4 agents';
run;
/*Step 2:Using subjects unexposed to any of agents as a ref. group*/
proc sql;
create table t as
select
id, a1, a2, a3,a4,lung, income,
sum(a1,a2,a3,a4)=0 as refgroup
from agents_expt
;
quit;
proc print data=t;
title 'Table 3: original variables and ref group';
run;
proc freq data=t;
tables lung* refgroup;
title 'Table 4: freq of ca case and pop cont for ref group';
run;
/*Step 3: Finding odds ratio estimates for variables including ref.group*/
/* LOGISTIC REG. TEST*/
data logtest; set t;
if lung in ('ca case','pop cont');
run;
proc logistic data=logtest;
model lung(event='ca case') =refgroup;
Title 'Table 5: Estimates for ref. group';
run;
When you say the odds ratio for the refgroup, I don't understand. That is what PROC LOGISTIC is giving (well at least the log odds ratio). It is the ratio of the odds of being in ca_case given the observation is in refgroup=0 to the odds of being in ca_case given the observation is in refgroup=1. If you restrict the analysis to only the unexposed ids, you can calculate the odds of the response, but there is no other classification to use to calculate an odds ratio. Maybe I am just dense this morning and not catching on to what you want to do.
SteveDenham
Well, then perhaps running the analysis by refgroup can yield something. I have to remove refgroup from the model statement under this scenario. The log then looks like this:
79 proc logistic data=logtest; 80 by refgroup; 81 model lung(event='ca case') = /clodds=both; 82 Title 'Table 5: Estimates for ref. group'; 83 run; NOTE: No explanatory variables have been specified. NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: The above message was for the following BY group: refgroup=0 NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: The above message was for the following BY group: refgroup=1 NOTE: There were 22 observations read from the data set WORK.LOGTEST.
I get an intercept for each of the models. For refgroup 0, it is -0.3365, for refgroup 1 it is -0.8473. Exponentiating these give 0.7124... for refgroup 0, and 0.42857...
for refgroup 2. These are just the ratios of cases to controls for each of the reference groups. No need for PROC LOGISTIC in this case, and perhaps just as important, there are no odds ratios within each refgroup. There are odds - not odds ratios. The intercept is on the log odds scale.
However, you have another variable in the dataset - income. You could calculate the odds ratio for some change in income. For this exercise, let's set that change at 1000. Then your code would look something like:
proc logistic data=logtest;
by refgroup;
model lung(event='ca case') = income /clodds=both;
units income=1000;
run;
If you run this, you will find the OR in both refgroups is really close to 1, meaning that an increase or decrease in income of $1000 has essentially no effect on the incidence rate within a refgroup..
One last try. Now lets put both refgroups and income, as well as any interaction, in the model. Since one is continuous and the other categorical, there are some changes;
proc logistic data=logtest;
class refgroup;
model lung(event='ca case') =refgroup income refgroup*income ;
oddsratio refgroup/at (income = 20000 to 200000 by 20000);
run;
And now you see an increase in the OR as income increases. However, the confidence bounds seem to grow even faster, thus reflecting the maximum likelihood tests that found no significant factors in the model.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.