BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ak2011
Fluorite | Level 6
I would greatly appreciate if someone helps me with the code to count the number of id_job 
combinations for the merged dataset below: I have a total of 17 obs with 8 id_jobs: os1-1,os1-1 and os1-1
is 1 id-job combination;os1-2,os1-2,os1-2,os1-2 and os1-2 is another id-job combination,etc
for a total of 8 id-job combinations. For cla_expo & bio-exp, 0=unexposed and 1=exposed.
My task is to count the number of id-job exposed for cla and bio: For cla(idchem=99005),bio(idchem=990021):
For cla: number of id_job exposed is 5/8=62.5%, bio:3/8=37.5%

Please find attached the output for cla, bio and merged cla and bio.

I tried proc freq idchem(not shown) but it didn't give me the right percentages.
Could someone help me with the SAS code to compute the percent id_job exposed for cla and bio,please

Also for the id exposed : For cla(idchem=990005), % id exposed is 3/4 = 75%

Thanks very much.








data idnew1;
input id$ job idchem;
datalines;
os1 1 990005
os1 1 9900021
os1 1 211700
os1 2 211700
os1 2 9900021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 9900021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;
run;

 

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 data idnew1;
74 input id$ job idchem;
75 datalines;
 
NOTE: The data set WORK.IDNEW1 has 17 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.02 seconds
 
 
93 ;
94 run;
95
96 /* Cla-Exposure*/
97 data cla;
98 set idnew1;
99 if idchem=990005 then Cla_Exp=1; else Cla_Exp=0;
100 run;
 
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.CLA has 17 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
101
102 proc print data=cla;
103 run;
 
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.17 seconds
cpu time 0.18 seconds
 
 
104
105 /* Bio-Exposure*/
106 data bio;
107 set idnew1;
108 if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;
109 run;
 
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.BIO has 17 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
110
111 proc print data=bio;
112 run;
 
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.17 seconds
cpu time 0.17 seconds
 
 
113
114 /* Merging cla & bio files*/
115 data m1; merge cla bio;
116 run;
 
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: The data set WORK.M1 has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 
 
117
118 proc print data=m1;
119 Title"Merged Cla & Bio exposures";
120 run;
 
NOTE: There were 17 observations read from the data set WORK.M1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.18 seconds
cpu time 0.18 seconds
 
 
121
122
123
124 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
136
1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @ak2011 ,

 

You can try this. The idea is to retrieve separately frequencies and totals, and then to calculate percentages.

In my opinion, it is not necessary to create additional variables (CLA_EXP, ... )

Hope this help!

 

Best,

 

proc sql;
	select a.idchem, a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

 

View solution in original post

11 REPLIES 11
ed_sas_member
Meteorite | Level 14

Hi @ak2011 ,

 

You can try this. The idea is to retrieve separately frequencies and totals, and then to calculate percentages.

In my opinion, it is not necessary to create additional variables (CLA_EXP, ... )

Hope this help!

 

Best,

 

proc sql;
	select a.idchem, a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

 

ak2011
Fluorite | Level 6
Thanks very much. It works perfect!
ak2011
Fluorite | Level 6
Hi,
Thanks once again for the SAS code. Is there a way to show the numerators and denominators of the calculations: eg. 5/8 = 62.5%,etc. Thanks.
ed_sas_member
Meteorite | Level 14

Hi @ak2011 ,

 

I have added the numerator (variable "frequency") and the denominator (variable "total") in the select clause:


data m1;
	set idnew1;
	id_job = catx("_",id,job);
run;
proc sql;
	select a.idchem,
		   a.frequency,
		   b.total,
		   a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

 

 The result is the following :Capture d’écran 2019-11-19 à 17.59.50.png

If you want to have a table rather than a report, you can add the clause "create table <dataset name> as" before the select clause in the same statement.

ak2011
Fluorite | Level 6
Another issue, please: How can I arrange the frequencies and percentages in descending count:
I tried several options all to no avail.
Please help. Thanks.
ak.
ed_sas_member
Meteorite | Level 14

Hi @ak2011 

 

You can add the "order by" clause just before the "quit" statement as follows:

data m1;
	set idnew1;
	id_job = catx("_",id,job);
run;
proc sql;
	select a.idchem,
		   a.frequency,
		   b.total,
		   a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b
	order by a.frequency desc;
quit;

Best,

 

ak2011
Fluorite | Level 6
Hi,
Your proc sql approach is very helpful but is there a way to use the same proc sql for cross tabulations for a larger dataset with 30 idchems(not 4 like the smaller dataset).
eg. cla_exp * bio_exp; cla_exp* amo_exp etc.
I would really appreciate your help.
Thanks in advance.
ak.

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand in the Innovate Hub.

Watch Now →
Develop Code with SAS Studio

Get started using SAS Studio to write, run and debug your SAS programs.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 5028 views
  • 0 likes
  • 2 in conversation