I would appreciate if someone could provide me the SAS code to count % subjects(ids) exposed to chemicals:idchem variable:
The SAS code to merge two datasets (17obs each ) extracted from over 100,000 obs (the larger dataset) is shown below: Output is attached.
The idea is if the SAS code works for this 17 obs, then it can be applied to the larger dataset:
I used proc freq id*idchem and other methods(not shown) but I did not arrive at the right answer shown in the manual calculations below:
I am counting single ids exposed to cla and bio separately with respect to idchem's. From the tables under cla_exp and bio_exp columns, 0=non-exposed and 1 is exposed.
For cla: idchem=990005, For the os1's :os1-1 and os1-2, cla_exp for each = 1(but same id(os1), so 1 subject(id) exposed):
For the os2;s: os2-1(1 exposure) and os2-2(1 exposure) but same id so 1 subject exposed:
For For the os3's: No exposure ie zero (0)
For the os4;s: os4-1(1 exposure) 1 subject exposed:
So %ge id exposure for idchem=99005 is 3/4=75%
Same principle will be applied for bio.
Could someone help me with the SAS code to
a.find the %ge ids exposed to idchem=990005,etc for cla and others for bio?
b. also find the %ge those exposed to other idchems among those exposed to idchem=990005. ie. cross association between idchem=990005 and each of the other idchems.
Thanks in advance.
ak.
data idnew1;
input id$ job idchem;
datalines;
os1 1 990005
os1 1 9900021
os1 1 211700
os1 2 211700
os1 2 9900021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 9900021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;
run;
/* Cla-Exposure*/
data cla;
set idnew1;
if idchem=990005 then Cla_Exp=1; else Cla_Exp=0;
id_job=catx('_', id, job);
put _all_;
run;
proc print data=cla;
run;
/* Bio-Exposure*/
data bio;
set idnew1;
if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;
id_job=catx('_', id, job);
put _all_;
run;
proc print data=bio;
run;
/* Merging cla & bio files*/
data m1; merge cla bio;
/*id_job=catx('_', id, job);*/
put _all_;
run;
data try;
set m1;
id_job=catx('_', id, job);
put _all_;
run;
proc print data=m1;
Title"Merged Cla & Bio exposures";
run;
proc sql;
select a.idchem, a.frequency / b.total as percentage format=Percent8.2
from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
(select count(distinct(id_job)) as total from m1) as b;
quit;
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 data idnew1;
74 input id$ job idchem;
75 datalines;
NOTE: The data set WORK.IDNEW1 has 17 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
93 ;
94 run;
95
96 /* Cla-Exposure*/
97 data cla;
98 set idnew1;
99 if idchem=990005 then Cla_Exp=1; else Cla_Exp=0;
100 id_job=catx('_', id, job);
101 put _all_;
102 run;
id=os1 job=1 idchem=990005 Cla_Exp=1 id_job=os1_1 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Cla_Exp=0 id_job=os1_1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Cla_Exp=0 id_job=os1_1 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Cla_Exp=0 id_job=os1_2 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Cla_Exp=0 id_job=os1_2 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Cla_Exp=0 id_job=os1_2 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Cla_Exp=1 id_job=os1_2 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Cla_Exp=0 id_job=os2_1 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Cla_Exp=1 id_job=os2_1 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Cla_Exp=0 id_job=os2_2 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Cla_Exp=0 id_job=os2_3 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Cla_Exp=1 id_job=os2_3 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Cla_Exp=0 id_job=os3_3 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Cla_Exp=0 id_job=os3_1 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Cla_Exp=0 id_job=os4_1 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Cla_Exp=1 id_job=os4_1 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Cla_Exp=0 id_job=os4_1 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.CLA has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
103
104 proc print data=cla;
105 run;
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.34 seconds
cpu time 0.34 seconds
106
107 /* Bio-Exposure*/
108 data bio;
109 set idnew1;
110 if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;
111 id_job=catx('_', id, job);
112 put _all_;
113 run;
id=os1 job=1 idchem=990005 Bio_Exp=0 id_job=os1_1 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Bio_Exp=1 id_job=os1_1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Bio_Exp=0 id_job=os1_1 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Bio_Exp=0 id_job=os1_2 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Bio_Exp=1 id_job=os1_2 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Bio_Exp=0 id_job=os1_2 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Bio_Exp=0 id_job=os1_2 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Bio_Exp=0 id_job=os2_1 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Bio_Exp=0 id_job=os2_1 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Bio_Exp=1 id_job=os2_2 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Bio_Exp=0 id_job=os2_3 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Bio_Exp=0 id_job=os2_3 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Bio_Exp=0 id_job=os3_3 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Bio_Exp=0 id_job=os3_1 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Bio_Exp=0 id_job=os4_1 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Bio_Exp=0 id_job=os4_1 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Bio_Exp=0 id_job=os4_1 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.BIO has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
114
115
116 proc print data=bio;
117 run;
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.25 seconds
cpu time 0.25 seconds
118
119 /* Merging cla & bio files*/
120 data m1; merge cla bio;
121 /*id_job=catx('_', id, job);*/
122 put _all_;
123 run;
id=os1 job=1 idchem=990005 Cla_Exp=1 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Cla_Exp=0 id_job=os1_1 Bio_Exp=1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Cla_Exp=0 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Cla_Exp=0 id_job=os1_2 Bio_Exp=1 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Cla_Exp=1 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Cla_Exp=0 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Cla_Exp=1 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Cla_Exp=0 id_job=os2_2 Bio_Exp=1 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Cla_Exp=0 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Cla_Exp=1 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Cla_Exp=0 id_job=os3_3 Bio_Exp=0 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Cla_Exp=0 id_job=os3_1 Bio_Exp=0 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Cla_Exp=1 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: The data set WORK.M1 has 17 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
124
125 data try;
126 set m1;
127 id_job=catx('_', id, job);
128 put _all_;
129 run;
id=os1 job=1 idchem=990005 Cla_Exp=1 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Cla_Exp=0 id_job=os1_1 Bio_Exp=1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Cla_Exp=0 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Cla_Exp=0 id_job=os1_2 Bio_Exp=1 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Cla_Exp=1 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Cla_Exp=0 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Cla_Exp=1 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Cla_Exp=0 id_job=os2_2 Bio_Exp=1 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Cla_Exp=0 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Cla_Exp=1 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Cla_Exp=0 id_job=os3_3 Bio_Exp=0 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Cla_Exp=0 id_job=os3_1 Bio_Exp=0 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Cla_Exp=1 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.M1.
NOTE: The data set WORK.TRY has 17 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
130
131 proc print data=m1;
132 Title"Merged Cla & Bio exposures";
133 run;
NOTE: There were 17 observations read from the data set WORK.M1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.26 seconds
cpu time 0.26 seconds