BookmarkSubscribeRSS Feed
ak2011
Fluorite | Level 6

Hello,

I would appreciate if someone could provide me with a better approach to solve the problem.

My main aims are

 a. count the number of cases(ca case) and controls (pop cont) not exposed to any of the the 4 agents (a1, a2,a3 and a4 ) from the dataset agents_expt (Table 1) below. Exposed is 1 and unexposed is zero (0). 

Results: 3 ca cases and 7 pop cont obtained(Table 2). -Step 1 of SAS code

b.Create/name the results obtained in Table 2(ie. subjects unexposed to any of the agents as a reference group(refgroup) for the purpose of comparison.-Step 2 of SAS code

c. Find the estimates (odds ratio) for the refgroup and other variables including income.-step 3 of SAS code.

Looks like my SAS code are too long to achieve the above aims.

 

I would appreciate if someone could provide me with a better approach to solve the problem.

 

*Pls note: For the purpose of  this test approach,  let us ignore the warning:quasi-separation of points(as there are inadequate data) . If the code works for this test data, I am sure it will work for the original dataset too.

  My dataset, code and log are found below; results are attached.

 

Thanks in advance for your expertise.

ak.

 


/* Logistic test ref group test*/
data agents_expt;
input id$ a1 a2 a3 a4 lung$ 14-21 income 23-29;
datalines;
os1 1 0 0 1 ca case 45424
os2 1 1 0 0 ca case 52877
os3 0 0 0 0 pop cont 25600
os4 1 0 0 1 pop cont 14888
os5 0 0 0 0 ca case 41036
os6 0 0 0 0 ca case 20365
os7 1 0 1 1 pop cont 16988
os8 0 0 0 0 ca case 100962
os9 1 0 1 0 pop cont 11230
os10 0 0 1 0 ca case 35850
os11 0 1 0 0 pop cont 28700
os12 0 0 0 0 pop cont 46320
os13 1 1 1 1 pop cont 24897
os14 0 0 0 0 pop cont 18966
os15 1 0 0 1 ca case 20540
os16 0 0 1 0 pop cont 150600
os17 1 1 1 1 pop cont 24897
os18 0 0 0 0 pop cont 17999
os19 0 0 0 0 pop cont 22540
os20 0 0 0 0 pop cont 158600
os21 0 0 0 0 pop cont 187365
os22 1 0 1 0 ca case 30580
;
run;
proc print data=agents_expt;
Title 'Table 1: Exposure of ids to 4 agents';

/*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/
proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));
tables lung;
title 'Table 2:Subjects unexposed to any of the 4 agents';
run;

/*Step 2:Using subjects unexposed to any of agents as a ref. group*/

proc sql;
create table t as
select
id, a1, a2, a3,a4,lung, income,
sum(a1,a2,a3,a4)=0 as refgroup
from agents_expt
;
quit;

proc print data=t;
title 'Table 3: original variables and ref group';
run;

proc freq data=t;
tables lung* refgroup;
title 'Table 4: freq of ca case and pop cont for ref group';
run;

/*Step 3: Finding odds ratio estimates for variables including ref.group*/

/* LOGISTIC REG. TEST*/
data logtest; set t;/*P stands for Pooled*/
if lung in ('ca case','pop cont');
run;

proc logistic data=logtest;
/*class cla_scat (param=ref ref ='0');*/
/*model lung(event='Ca case') = cla_expf age cigcsi;*/
model lung(event='ca case') =a1 a2 a3 a4 refgroup income;
Title 'Table 5: Estimates for variables including ref. group';
run;

 

 

 

 

 

OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

 72        

 73        

 74         /* Logistic test ref group test*/

 75          data agents_expt;

 76         input id$ a1 a2 a3 a4  lung$ 14-21 income 23-29;

 77         datalines;

 

 NOTE: The data set WORK.AGENTS_EXPT has 22 observations and 7 variables.

 NOTE: DATA statement used (Total process time):

       real time           0.01 seconds

       cpu time            0.01 seconds

       

 

 100        ;

 101        run;

 102        proc print data=agents_expt;

 103        Title 'Table 1: Exposure of ids to 4 agents';

 104       

 105          /*Step 1: Finding number of cases and controls unexposed to agents(a1,a2,a3 and a4)*/

 

 NOTE: There were 22 observations read from the data set WORK.AGENTS_EXPT.

 NOTE: PROCEDURE PRINT used (Total process time):

       real time           0.27 seconds

       cpu time            0.27 seconds

      

 

 106        proc freq data=agents_expt(where=(sum(a1,a2,a3,a4)=0));

 107             tables lung;

 108        title 'Table 2:Subjects unexposed to any of the 4 agents';

 109        run;

 

 NOTE: There were 10 observations read from the data set WORK.AGENTS_EXPT.

       WHERE SUM(a1, a2, a3, a4)=0;

 NOTE: PROCEDURE FREQ used (Total process time):

       real time           0.13 seconds

       cpu time            0.11 seconds

      

 

 110        

 111        /*Step 2:Using subjects unexposed to any of agents as a ref. group*/

 112       

 113        proc sql;

 114        create table t as

 115          select

 116            id, a1, a2, a3,a4,lung, income,

 117            sum(a1,a2,a3,a4)=0 as refgroup

 118             from agents_expt

 119             ;

 NOTE: Table WORK.T created, with 22 rows and 8 columns.

 

 120             quit;

 NOTE: PROCEDURE SQL used (Total process time):

       real time           0.01 seconds

       cpu time            0.02 seconds

      

 

 121       

 122           proc print data=t;

 123           title 'Table 3: original variables and ref group';

 124           run;

 

 NOTE: There were 22 observations read from the data set WORK.T.

 NOTE: PROCEDURE PRINT used (Total process time):

       real time           0.22 seconds

       cpu time            0.23 seconds

      

 

 125       

 126            proc freq data=t;

 127            tables lung* refgroup;

 128            title 'Table 4: freq of ca case and pop cont for ref group';

 129            run;

 

 NOTE: There were 22 observations read from the data set WORK.T.

 NOTE: PROCEDURE FREQ used (Total process time):

       real time           0.15 seconds

       cpu time            0.14 seconds

      

 

 130       

 131        /*Step 3: Finding odds ratio estimates for variables including ref.group*/

 132       

 133        /* LOGISTIC REG. TEST*/

 134        data logtest; set t;/*P stands for Pooled*/

 135        if lung in ('ca case','pop cont');

 136        run;

 

 NOTE: There were 22 observations read from the data set WORK.T.

 NOTE: The data set WORK.LOGTEST has 22 observations and 8 variables.

 NOTE: DATA statement used (Total process time):

       real time           0.01 seconds

       cpu time            0.01 seconds

      

 

 137       

 138         proc logistic data=logtest;

 139        /*class cla_scat (param=ref ref ='0');*/

 140        /*model lung(event='Ca case') = cla_expf age cigcsi;*/

 141        model lung(event='ca case') =a1 a2 a3 a4 refgroup income;

 142        Title 'Table 5: Estimates for variables including ref. group';

 143        run;

 

 NOTE: PROC LOGISTIC is modeling the probability that lung='ca case'.

 WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist.

 WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood

          iteration. Validity of the model fit is questionable.

 NOTE: There were 22 observations read from the data set WORK.LOGTEST.

 NOTE: PROCEDURE LOGISTIC used (Total process time):

       real time           0.44 seconds

       cpu time            0.41 seconds

      

 

 144       

 145       

 146        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

 158       

 

 

4 REPLIES 4
SASJedi
Ammonite | Level 13
Moved to the Statistical Procedures community, where it will likely get seen by folks with more expertise in this area.
Check out my Jedi SAS Tricks for SAS Users
SteveDenham
Jade | Level 19

It runs without errors (but with the quasi-separation warning) on my machine.  Since you are devising a reference group based on sum(a1-a4)=0, you may wish to try one of the following:

1. Remove a1 thru a4 as predictors This would give the OR for the levels of refgroup.

2. Use a BY refgroup statement and remove refgroup as a predictor.  This would give the ORs for each refgroup separately. 

 

I don't think you are going to be able to get accurate estimates if both refgroup and a1 thru a4 are included, primarily due to multicollinearity.

 

SteveDenham

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1425 views
  • 1 like
  • 3 in conversation