Hello,
I would appreciate if someone could provide me with a better approach to solve the problem.
My main aims are
a. count the number of cases(ca case) and controls (pop cont) not exposed to any of the the 4 agents (a1, a2,a3 and a4 ) from the dataset agents_expt (Table 1) below. Exposed is 1 and unexposed is zero (0).
Results: 3 ca cases and 7 pop cont obtained(Table 2). -Step 1 of SAS code
b.Create/name the results obtained in Table 2(ie. subjects unexposed to any of the agents as a reference group(refgroup) for the purpose of comparison.-Step 2 of SAS code
c. Find the estimates (odds ratio) for the refgroup and other variables including income.-step 3 of SAS code.
Looks like my SAS code are too long to achieve the above aims.
I would appreciate if someone could provide me with a better approach to solve the problem.
*Pls note: For the purpose of this test approach, let us ignore the warning:quasi-separation of points(as there are inadequate data) . If the code works for this test data, I am sure it will work for the original dataset too.
My dataset, code and log are found below; results are attached.
Thanks in advance for your expertise.
ak.
/* Logistic test ref group test*/
data agents_expt;
input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;
datalines;
os1 1 0 0 1 ca case 45424
os2 1 1 0 0 ca case 52877
os3 0 0 0 0 pop cont 25600
os4 1 0 0 1 pop cont 14888
os5 0 0 0 0 ca case 41036
os6 0 0 0 0 ca case 20365
os7 1 0 1 1 pop cont 16988
os8 0 0 0 0 ca case 100962
os9 1 0 1 0 pop cont 11230
os10 0 0 1 0 ca case 35850
os11 0 1 0 0 pop cont 28700
os12 0 0 0 0 pop cont 46320
os13 1 1 1 1 pop cont 24897
os14 0 0 0 0 pop cont 18966
os15 1 0 0 1 ca case 20540
os16 0 0 1 0 pop cont 150600
os17 1 1 1 1 pop cont 24897
os18 0 0 0 0 pop cont 17999
os19 0 0 0 0 pop cont 22540
os20 0 0 0 0 pop cont 158600
os21 0 0 0 0 pop cont 187365
os22 1 0 1 0 ca case 30580
;
run;
proc print data=agents_expt;
Title 'Table 1: Exposure of ids to 4 agents';
/*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/
proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
tables lung;
title 'Table 2:Subjects unexposed to any of the 4 agents';
run;
/*Step 2:Using subjects unexposed to any of agents as a ref. group*/
proc sql;
create table t as
select
id, a1, a2, a3,a4,lung, income,
sum(a1,a2,a3,a4)=0 as refgroup
from agents_expt
;
quit;
proc print data=t;
title 'Table 3: original variables and ref group';
run;
proc freq data=t;
tables lung* refgroup;
title 'Table 4: freq of ca case and pop cont for ref group';
run;
/*Step 3: Finding odds ratio estimates for variables including ref.group*/
/* LOGISTIC REG. TEST*/
data logtest; set t;/*P stands for Pooled*/
if lung in ('ca case','pop cont');
run;
proc logistic data=logtest;
/*class cla_scat (param=ref ref ='0');*/
/*model lung(event='Ca case') = cla_expf age cigcsi;*/
model lung(event='ca case') =a1 a2 a3 a4 refgroup income;
Title 'Table 5: Estimates for variables including ref. group';
run;
OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73
74 /* Logistic test ref group test*/
75 data agents_expt;
76 input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;
77 datalines;
NOTE: The data set WORK.AGENTS_EXPT has 22 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
100 ;
101 run;
102 proc print data=agents_expt;
103 Title 'Table 1: Exposure of ids to 4 agents';
104
105 /*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/
NOTE: There were 22 observations read from the data set WORK.AGENTS_EXPT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.27 seconds
cpu time 0.27 seconds
106 proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
107 tables lung;
108 title 'Table 2:Subjects unexposed to any of the 4 agents';
109 run;
NOTE: There were 10 observations read from the data set WORK.AGENTS_EXPT.
WHERE SUM(a1, a2, a3, a4)=0;
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.13 seconds
cpu time 0.11 seconds
110
111 /*Step 2:Using subjects unexposed to any of agents as a ref. group*/
112
113 proc sql;
114 create table t as
115 select
116 id, a1, a2, a3,a4,lung, income,
117 sum(a1,a2,a3,a4)=0 as refgroup
118 from agents_expt
119 ;
NOTE: Table WORK.T created, with 22 rows and 8 columns.
120 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
121
122 proc print data=t;
123 title 'Table 3: original variables and ref group';
124 run;
NOTE: There were 22 observations read from the data set WORK.T.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.22 seconds
cpu time 0.23 seconds
125
126 proc freq data=t;
127 tables lung* refgroup;
128 title 'Table 4: freq of ca case and pop cont for ref group';
129 run;
NOTE: There were 22 observations read from the data set WORK.T.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.15 seconds
cpu time 0.14 seconds
130
131 /*Step 3: Finding odds ratio estimates for variables including ref.group*/
132
133 /* LOGISTIC REG. TEST*/
134 data logtest; set t;/*P stands for Pooled*/
135 if lung in ('ca case','pop cont');
136 run;
NOTE: There were 22 observations read from the data set WORK.T.
NOTE: The data set WORK.LOGTEST has 22 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
137
138 proc logistic data=logtest;
139 /*class cla_scat (param=ref ref ='0');*/
140 /*model lung(event='Ca case') = cla_expf age cigcsi;*/
141 model lung(event='ca case') =a1 a2 a3 a4 refgroup income;
142 Title 'Table 5: Estimates for variables including ref. group';
143 run;
NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.
WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood
iteration. Validity of the model fit is questionable.
NOTE: There were 22 observations read from the data set WORK.LOGTEST.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.44 seconds
cpu time 0.41 seconds
144
145
146 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
158
It runs without errors (but with the quasi-separation warning) on my machine. Since you are devising a reference group based on sum(a1-a4)=0, you may wish to try one of the following:
1. Remove a1 thru a4 as predictors This would give the OR for the levels of refgroup.
2. Use a BY refgroup statement and remove refgroup as a predictor. This would give the ORs for each refgroup separately.
I don't think you are going to be able to get accurate estimates if both refgroup and a1 thru a4 are included, primarily due to multicollinearity.
SteveDenham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.