Dear experts, I have a dataset containing columns for ID, species, product and problem. In the end I want to calculate the Odds Ratio for each pair of product and problem per species, meaning that I have to determine the values for a 2x2 crosstable for each row. Each ID can have multiple values for product and problem (but only one species). It is important, that each ID is counted only once in the crosstable. I'm working with SAS 9.4. Here is an example dataset: data raw;
input id species $ product $ problem $15-25;
datalines;
123 dog prod_a deafness
123 dog prod_a headshake
123 dog prod_b deafness
123 dog prod_b headshake
345 dog prod_a itching
345 dog prod_c itching
234 cat prod_c hair_loss
567 cat prod_d hair_loss
678 cat prod_e deafness
321 dog prod_a deafness
321 dog prod_a headshake
321 dog prod_c deafness
321 dog prod_c headshake
543 dog prod_a itching
543 dog prod_c itching
432 cat prod_c hair_loss
765 cat prod_d hair_loss
876 cat prod_e deafness
;
run; After sorting by species to enable species specific analyses, I want to count the IDs for my crosstable under the following conditions: A = [IDs with the product and with the problem] B = [IDs with the product but without the problem] C = [IDs without the product but with the problem] D = [IDs without the product and without the problem] The ROR is then calculated with (A/B)/(C/D). This is my code: proc sql;
select distinct raw.species, raw.product, raw.problem,
(select count(id) from raw sub
where sub.species = raw.species
and sub.id ^= raw.id
and sub.product = raw.product
and sub.problem = raw.problem) as a,
(select count(id) from ror sub
where sub.species = raw.species
and sub.id ^= raw.id
and sub.product = raw.product
and sub.problem ^= raw.problem) as b,
(select count(id) from raw sub
where sub.species = raw.species
and sub.id ^= raw.id
and sub.product ^= raw.product
and sub.problem = raw.problem) as c,
(select count(id) from raw sub
where sub.species = raw.species
and sub.id ^= raw.id
and sub.product ^= raw.product
and sub.problem ^= raw.problem) as d,
(calculated a/ calculated b)/(calculated c/calculated d) as ror
from raw;
quit; This is the result: failed_case_counts Now I have two problems: „a“ is expected to be at least 1, but the code does not count the ID itself (that's logic, as it counts where sub.id ^= raw.id) From the preliminary results, I can see, that with this code each ID can be counted more than once (thus contributing to more than one field of the 2x2 table). This must not be the case. Is there anyone out there, who can help with the solution? I hope I made myself clear enough... Looking forward to your hints and ideas! Kind regards, SabineT
... View more