Solved: Re: Counting id-job combinations and ids exposed-2nd post: Please igno...

ak2011 · Posted 11-19-2019 06:02 AM

I would greatly appreciate if someone helps me with the code to count the number of id_job 
combinations for the merged dataset below: I have a total of 17 obs with 8 id_jobs: os1-1,os1-1 and os1-1
is 1 id-job combination;os1-2,os1-2,os1-2,os1-2 and os1-2 is another id-job combination,etc 
for a total of 8 id-job combinations. For cla_expo & bio-exp, 0=unexposed and 1=exposed. 
My task is to count the number of id-job exposed for cla and bio: For cla(idchem=99005),bio(idchem=990021):
For cla: number of id_job exposed is 5/8=62.5%, bio:3/8=37.5%

Please find attached the output for cla, bio and merged cla and bio.

I tried proc freq idchem(not shown) but it didn't give me the right percentages. 
Could someone help me with the SAS code to compute the percent id_job exposed for cla and bio,please

Also for the id exposed : For cla(idchem=990005), % id exposed is 3/4 = 75%

Thanks very much.

 






data idnew1;
input id$ job idchem;
datalines;
os1 1 990005
os1 1 9900021
os1 1 211700
os1 2 211700
os1 2 9900021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 9900021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;
run;

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

72

73 data idnew1;

74 input id$ job idchem;

75 datalines;

NOTE: The data set WORK.IDNEW1 has 17 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.02 seconds

93 ;

94 run;

95

96 /* Cla-Exposure*/

97 data cla;

98 set idnew1;

99 if idchem=990005 then Cla_Exp=1; else Cla_Exp=0;

100 run;

NOTE: There were 17 observations read from the data set WORK.IDNEW1.

NOTE: The data set WORK.CLA has 17 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

101

102 proc print data=cla;

103 run;

NOTE: There were 17 observations read from the data set WORK.CLA.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.17 seconds

cpu time 0.18 seconds

104

105 /* Bio-Exposure*/

106 data bio;

107 set idnew1;

108 if idchem=9900021 then Bio_Exp=1; else Bio_Exp=0;

109 run;

NOTE: There were 17 observations read from the data set WORK.IDNEW1.

NOTE: The data set WORK.BIO has 17 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

110

111 proc print data=bio;

112 run;

NOTE: There were 17 observations read from the data set WORK.BIO.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.17 seconds

cpu time 0.17 seconds

113

114 /* Merging cla & bio files*/

115 data m1; merge cla bio;

116 run;

NOTE: There were 17 observations read from the data set WORK.CLA.

NOTE: There were 17 observations read from the data set WORK.BIO.

NOTE: The data set WORK.M1 has 17 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.02 seconds

117

118 proc print data=m1;

119 Title"Merged Cla & Bio exposures";

120 run;

NOTE: There were 17 observations read from the data set WORK.M1.

NOTE: PROCEDURE PRINT used (Total process time):

real time 0.18 seconds

cpu time 0.18 seconds

121

122

123

124 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

136

ed_sas_member · Posted 11-19-2019 07:00 AM

Hi @ak2011 ,

You can try this. The idea is to retrieve separately frequencies and totals, and then to calculate percentages.

In my opinion, it is not necessary to create additional variables (CLA_EXP, ... )

Hope this help!

Best,

proc sql;
	select a.idchem, a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

View solution in original post

ed_sas_member · Posted 11-19-2019 07:00 AM

Hi @ak2011 ,

You can try this. The idea is to retrieve separately frequencies and totals, and then to calculate percentages.

In my opinion, it is not necessary to create additional variables (CLA_EXP, ... )

Hope this help!

Best,

proc sql;
	select a.idchem, a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

ak2011 · Posted 11-19-2019 10:11 AM

Thanks very much. It works perfect!

ed_sas_member · Posted 11-19-2019 11:03 AM

Hi @ak2011 ,

you're welcome !

ak2011 · Posted 11-19-2019 11:19 AM

Hi,
Thanks once again for the SAS code. Is there a way to show the numerators and denominators of the calculations: eg. 5/8 = 62.5%,etc. Thanks.

ed_sas_member · Posted 11-19-2019 11:59 AM

Hi @ak2011 ,

I have added the numerator (variable "frequency") and the denominator (variable "total") in the select clause:


data m1;
	set idnew1;
	id_job = catx("_",id,job);
run;
proc sql;
	select a.idchem,
		   a.frequency,
		   b.total,
		   a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b;
quit;

The result is the following :

If you want to have a table rather than a report, you can add the clause "create table <dataset name> as" before the select clause in the same statement.

ak2011 · Posted 11-20-2019 04:07 AM

Perfect! Much thanks again!

ak2011 · Posted 11-20-2019 06:29 AM

Another issue, please: How can I arrange the frequencies and percentages in descending count:
I tried several options all to no avail.
Please help. Thanks.
ak.

ed_sas_member · Posted 11-20-2019 06:52 AM

Hi @ak2011

You can add the "order by" clause just before the "quit" statement as follows:

data m1;
	set idnew1;
	id_job = catx("_",id,job);
run;
proc sql;
	select a.idchem,
		   a.frequency,
		   b.total,
		   a.frequency / b.total as percentage format=Percent8.2
	from (select idchem, count(id_job) as frequency from m1 group by idchem) as a,
		 (select count(distinct(id_job)) as total from m1) as b
	order by a.frequency desc;
quit;

Best,

ak2011 · Posted 11-20-2019 10:39 AM

Thanks again! It works.

ed_sas_member · Posted 11-20-2019 10:44 AM

You're welcome!

ak2011 · Posted 11-25-2019 05:57 AM

Hi,
Your proc sql approach is very helpful but is there a way to use the same proc sql for cross tabulations for a larger dataset with 30 idchems(not 4 like the smaller dataset).
eg. cla_exp * bio_exp; cla_exp* amo_exp etc.
I would really appreciate your help.
Thanks in advance.
ak.

Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Re: Counting id-job combinations and ids exposed-2nd post: Please ignore first

Classroom Training Available!