BookmarkSubscribeRSS Feed
ak2011
Fluorite | Level 6
 
I would appreciate if someone could provide me the SAS code to count % subjects(ids) exposed to chemicals:idchem variable:
The SAS code to merge two datasets (17obs each )  extracted from over 100,000 obs (the larger dataset) is shown below: Output is attached. 
The idea is if the SAS code works for this 17 obs, then it can be applied to the larger dataset:
I used proc freq id*idchem and other methods(not shown) but I did not arrive at the right answer shown in the manual calculations below:
 
I am counting single ids exposed to cla and bio separately with respect to idchem's. From the tables under cla_exp and bio_exp columns, 0=non-exposed and 1 is exposed. 
For cla: idchem=990005, For the os1's :os1-1 and os1-2, cla_exp for each = 1(but same id(os1), so 1 subject(id) exposed):
For the os2;s: os2-1(1 exposure) and os2-2(1 exposure) but same id so 1 subject exposed:
For For the os3's: No exposure ie zero (0)
For the os4;s: os4-1(1 exposure)  1 subject exposed:
So %ge id exposure for idchem=99005 is 3/4=75%
 
Same principle will be applied for bio.
Could someone help me with the SAS code to
a.find the %ge ids exposed to idchem=990005,etc for cla and others for bio? 
b. also find the %ge those exposed to other idchems among those exposed to idchem=990005. ie. cross association between idchem=990005 and each of the other idchems.
 
Thanks in advance.
ak.
 
 

data idnew1;
input id$ job idchem;
datalines;
os1 1 990005
os1 1 9900021
os1 1 211700
os1 2 211700
os1 2 9900021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 9900021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;
run;

/* Cla-Exposure*/
data cla;
set idnew1;
if idchem=990005 then Cla_Exp=1; else Cla_Exp=0;
id_job=catx('_', id, job);
put _all_;
run;

proc print data=cla;
run;

/* Bio-Exposure*/
data bio;
set idnew1;
if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;
id_job=catx('_', id, job);
put _all_;
run;


proc print data=bio;
run;

/* Merging cla & bio files*/
data m1; merge cla bio;
/*id_job=catx('_', id, job);*/
put _all_;
run;

data try;
set m1;
id_job=catx('_', id, job);
put _all_;
run;

proc print data=m1;
Title"Merged Cla & Bio exposures";
run;

proc sql;
select a.idchem, a.frequency / b.total as percentage format=Percent8.2
from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
(select count(distinct(id_job)) as total from m1) as b;
quit;
 
 
 
 
 
 
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 data idnew1;
74 input id$ job idchem;
75 datalines;
 
NOTE: The data set WORK.IDNEW1 has 17 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
93 ;
94 run;
95
96 /* Cla-Exposure*/
97 data cla;
98 set idnew1;
99 if idchem=990005 then Cla_Exp=1; else Cla_Exp=0;
100 id_job=catx('_', id, job);
101 put _all_;
102 run;
 
id=os1 job=1 idchem=990005 Cla_Exp=1 id_job=os1_1 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Cla_Exp=0 id_job=os1_1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Cla_Exp=0 id_job=os1_1 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Cla_Exp=0 id_job=os1_2 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Cla_Exp=0 id_job=os1_2 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Cla_Exp=0 id_job=os1_2 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Cla_Exp=1 id_job=os1_2 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Cla_Exp=0 id_job=os2_1 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Cla_Exp=1 id_job=os2_1 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Cla_Exp=0 id_job=os2_2 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Cla_Exp=0 id_job=os2_3 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Cla_Exp=1 id_job=os2_3 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Cla_Exp=0 id_job=os3_3 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Cla_Exp=0 id_job=os3_1 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Cla_Exp=0 id_job=os4_1 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Cla_Exp=1 id_job=os4_1 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Cla_Exp=0 id_job=os4_1 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.CLA has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
 
 
103
104 proc print data=cla;
105 run;
 
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.34 seconds
cpu time 0.34 seconds
 
 
106
107 /* Bio-Exposure*/
108 data bio;
109 set idnew1;
110 if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;
111 id_job=catx('_', id, job);
112 put _all_;
113 run;
 
id=os1 job=1 idchem=990005 Bio_Exp=0 id_job=os1_1 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Bio_Exp=1 id_job=os1_1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Bio_Exp=0 id_job=os1_1 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Bio_Exp=0 id_job=os1_2 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Bio_Exp=1 id_job=os1_2 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Bio_Exp=0 id_job=os1_2 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Bio_Exp=0 id_job=os1_2 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Bio_Exp=0 id_job=os2_1 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Bio_Exp=0 id_job=os2_1 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Bio_Exp=1 id_job=os2_2 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Bio_Exp=0 id_job=os2_3 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Bio_Exp=0 id_job=os2_3 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Bio_Exp=0 id_job=os3_3 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Bio_Exp=0 id_job=os3_1 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Bio_Exp=0 id_job=os4_1 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Bio_Exp=0 id_job=os4_1 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Bio_Exp=0 id_job=os4_1 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.BIO has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
 
 
114
115
116 proc print data=bio;
117 run;
 
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.25 seconds
cpu time 0.25 seconds
 
 
118
119 /* Merging cla & bio files*/
120 data m1; merge cla bio;
121 /*id_job=catx('_', id, job);*/
122 put _all_;
123 run;
 
id=os1 job=1 idchem=990005 Cla_Exp=1 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Cla_Exp=0 id_job=os1_1 Bio_Exp=1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Cla_Exp=0 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Cla_Exp=0 id_job=os1_2 Bio_Exp=1 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Cla_Exp=1 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Cla_Exp=0 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Cla_Exp=1 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Cla_Exp=0 id_job=os2_2 Bio_Exp=1 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Cla_Exp=0 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Cla_Exp=1 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Cla_Exp=0 id_job=os3_3 Bio_Exp=0 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Cla_Exp=0 id_job=os3_1 Bio_Exp=0 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Cla_Exp=1 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: The data set WORK.M1 has 17 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
 
 
124
125 data try;
126 set m1;
127 id_job=catx('_', id, job);
128 put _all_;
129 run;
 
id=os1 job=1 idchem=990005 Cla_Exp=1 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=1
id=os1 job=1 idchem=9900021 Cla_Exp=0 id_job=os1_1 Bio_Exp=1 _ERROR_=0 _N_=2
id=os1 job=1 idchem=211700 Cla_Exp=0 id_job=os1_1 Bio_Exp=0 _ERROR_=0 _N_=3
id=os1 job=2 idchem=211700 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=4
id=os1 job=2 idchem=9900021 Cla_Exp=0 id_job=os1_2 Bio_Exp=1 _ERROR_=0 _N_=5
id=os1 job=2 idchem=210701 Cla_Exp=0 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=6
id=os1 job=2 idchem=990005 Cla_Exp=1 id_job=os1_2 Bio_Exp=0 _ERROR_=0 _N_=7
id=os2 job=1 idchem=210701 Cla_Exp=0 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=8
id=os2 job=1 idchem=990005 Cla_Exp=1 id_job=os2_1 Bio_Exp=0 _ERROR_=0 _N_=9
id=os2 job=2 idchem=9900021 Cla_Exp=0 id_job=os2_2 Bio_Exp=1 _ERROR_=0 _N_=10
id=os2 job=3 idchem=210701 Cla_Exp=0 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=11
id=os2 job=3 idchem=990005 Cla_Exp=1 id_job=os2_3 Bio_Exp=0 _ERROR_=0 _N_=12
id=os3 job=3 idchem=210701 Cla_Exp=0 id_job=os3_3 Bio_Exp=0 _ERROR_=0 _N_=13
id=os3 job=1 idchem=211700 Cla_Exp=0 id_job=os3_1 Bio_Exp=0 _ERROR_=0 _N_=14
id=os4 job=1 idchem=210701 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=15
id=os4 job=1 idchem=990005 Cla_Exp=1 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=16
id=os4 job=1 idchem=211700 Cla_Exp=0 id_job=os4_1 Bio_Exp=0 _ERROR_=0 _N_=17
NOTE: There were 17 observations read from the data set WORK.M1.
NOTE: The data set WORK.TRY has 17 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
 
 
130
131 proc print data=m1;
132 Title"Merged Cla & Bio exposures";
133 run;
 
NOTE: There were 17 observations read from the data set WORK.M1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.26 seconds
cpu time 0.26 seconds
 
 
134
135 proc sql;
136 select a.idchem, a.frequency / b.total as percentage format=Percent8.2
137 from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
138 (select count(distinct(id_job)) as total from m1) as b;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
139 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.12 seconds
cpu time 0.11 seconds
 
 
140
141
142 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
154
 

 

 

 

 

 

2 REPLIES 2
ballardw
Super User

First thing you don't need 3 data steps :

data work.idnew1;
   input id$ job idchem;
   /*if idchem=990005 then Cla_Exp=1; else Cla_Exp=0; */
   Cla_Exp = (idchem=990005);
   /*if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;*/
   Bio_Exp = (idchem=9900021);
   id_job=catx('_', id, job);
datalines;
os1 1 990005
os1 1 9900021
os1 1 211700
os1 2 211700
os1 2 9900021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 9900021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;
run;

The code I use

Cla_Exp = (idchem=990005);

uses the logic comparison result in SAS will have 1 for true and 0 for false.

Minor detail but executes quicker than If/then/else.

 

Second:

This bit is not very clear.

For cla: idchem=990005, For the os1's :os1-1 and os1-2, cla_exp for each = 1(but same id(os1), so 1 subject(id) exposed):

For the os2;s: os2-1(1 exposure) and os2-2(1 exposure) but same id so 1 subject exposed:

For For the os3's: No exposure ie zero (0)

For the os4;s: os4-1(1 exposure)  1 subject exposed:

So %ge id exposure for idchem=99005 is 3/4=75%

I think what you are asking is first find the maximum value of the exposure for each id. Then the percentage of 1's from that.

By picking a 1/0 coding scheme you actual make this quite easy. The MEAN of a 1/0 coded variable is the percentage of 1's as a decimal.

This uses my data set above:

Proc summary data= work.idnew1 nway;
class id;
var Cla_Exp Bio_Exp;
output out=work.summary max=;
run;

proc summary data=work.summary ;
var Cla_Exp Bio_Exp ;
output out=work.want(drop=_type_ _freq_) mean=;
run;

proc print data=work.want;
format  Cla_Exp Bio_Exp percent8.2;
run;

As a learning bit for future code. You may find it very advantageous when having a group of variables with a common theme, such as Exposure, to use that theme as the stem of variable names instead of a suffix.

Use

Exp_Cla Exp_Bio

instead of

Cla_Exp Bio_Exp

Why? SAS will allow you in data steps and many procedures (NOT Proc SQL) to reference a list of variables with a common stem with a short cut such as EXP_:  <= note the colon after the stem. You can use this list in Drop or Keep statements, Array definitions, and functions that take a list of variables such as MAX, MIN, MEAN, STD and such like: x = sum(of Exp_:); Which may be of interest as this sum would the number of different exposures some Id had(if the data is structured correctly).

 

 

Your request for the b) elements is even more vague. There are many ways to do association descriptions and I'm not at all sure what you may want with that one. I think that you need to explicitly show the series or calculations you want.

ak2011
Fluorite | Level 6
Thanks very much.
ak

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 977 views
  • 2 likes
  • 2 in conversation