BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ak2011
Fluorite | Level 6
I would greatly appreciate if someone helps me with the code to count the number of id_job 
combinations for the merged dataset below: I have a total of 17 obs with 8 id_jobs: os1-1,os1-1 and os1-1
is 1 id-job combination;os1-2,os1-2,os1-2,os1-2 and os1-2 is another id-job combination,etc
for a total of 8 id-job combinations. For cla_expo & bio-exp, 0=unexposed and 1=exposed.
My task is to count the number of id-job exposed for cla and bio: For cla(idchem=99005),bio(idchem=990021):
For cla: number of id_job exposed is 5/8=62.5%, bio:3/8=37.5%

Please find attached the output for cla, bio and merged cla and bio.

I tried proc freq idchem(not shown) but it didn't give me the right percentages.
Could someone help me with the SAS code to compute the percent id_job exposed for cla and bio,please

Also for the id exposed : For cla(idchem=990005), % id exposed is 3/4 = 75%

Thanks very much.








data idnew1;
input id$ job idchem;
datalines;
os1 1 990005
os1 1 9900021
os1 1 211700
os1 2 211700
os1 2 9900021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 9900021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;
run;

 

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 data idnew1;
74 input id$ job idchem;
75 datalines;
 
NOTE: The data set WORK.IDNEW1 has 17 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.02 seconds
 
 
93 ;
94 run;
95
96 /* Cla-Exposure*/
97 data cla;
98 set idnew1;
99 if idchem=990005 then Cla_Exp=1; else Cla_Exp=0;
100 run;
 
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.CLA has 17 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
101
102 proc print data=cla;
103 run;
 
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.17 seconds
cpu time 0.18 seconds
 
 
104
105 /* Bio-Exposure*/
106 data bio;
107 set idnew1;
108 if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;
109 run;
 
NOTE: There were 17 observations read from the data set WORK.IDNEW1.
NOTE: The data set WORK.BIO has 17 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
110
111 proc print data=bio;
112 run;
 
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.17 seconds
cpu time 0.17 seconds
 
 
113
114 /* Merging cla & bio files*/
115 data m1; merge cla bio;
116 run;
 
NOTE: There were 17 observations read from the data set WORK.CLA.
NOTE: There were 17 observations read from the data set WORK.BIO.
NOTE: The data set WORK.M1 has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
 
 
117
118 proc print data=m1;
119 Title"Merged Cla & Bio exposures";
120 run;
 
NOTE: There were 17 observations read from the data set WORK.M1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.18 seconds
cpu time 0.18 seconds
 
 
121
122
123
124 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
136
1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @ak2011 ,

 

You can try this. The idea is to retrieve separately frequencies and totals, and then to calculate percentages.

In my opinion, it is not necessary to create additional variables (CLA_EXP, ... )

Hope this help!

 

Best,

 

proc sql;
	select a.idchem, a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

 

View solution in original post

11 REPLIES 11
ed_sas_member
Meteorite | Level 14

Hi @ak2011 ,

 

You can try this. The idea is to retrieve separately frequencies and totals, and then to calculate percentages.

In my opinion, it is not necessary to create additional variables (CLA_EXP, ... )

Hope this help!

 

Best,

 

proc sql;
	select a.idchem, a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

 

ak2011
Fluorite | Level 6
Thanks very much. It works perfect!
ak2011
Fluorite | Level 6
Hi,
Thanks once again for the SAS code. Is there a way to show the numerators and denominators of the calculations: eg. 5/8 = 62.5%,etc. Thanks.
ed_sas_member
Meteorite | Level 14

Hi @ak2011 ,

 

I have added the numerator (variable "frequency") and the denominator (variable "total") in the select clause:


data m1;
	set idnew1;
	id_job = catx("_",id,job);
run;
proc sql;
	select a.idchem,
		   a.frequency,
		   b.total,
		   a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

 

 The result is the following :Capture d’écran 2019-11-19 à 17.59.50.png

If you want to have a table rather than a report, you can add the clause "create table <dataset name> as" before the select clause in the same statement.

ak2011
Fluorite | Level 6
Another issue, please: How can I arrange the frequencies and percentages in descending count:
I tried several options all to no avail.
Please help. Thanks.
ak.
ed_sas_member
Meteorite | Level 14

Hi @ak2011 

 

You can add the "order by" clause just before the "quit" statement as follows:

data m1;
	set idnew1;
	id_job = catx("_",id,job);
run;
proc sql;
	select a.idchem,
		   a.frequency,
		   b.total,
		   a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b
	order by a.frequency desc;
quit;

Best,

 

ak2011
Fluorite | Level 6
Hi,
Your proc sql approach is very helpful but is there a way to use the same proc sql for cross tabulations for a larger dataset with 30 idchems(not 4 like the smaller dataset).
eg. cla_exp * bio_exp; cla_exp* amo_exp etc.
I would really appreciate your help.
Thanks in advance.
ak.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1290 views
  • 0 likes
  • 2 in conversation