About SabineT

SabineT · ‎03-27-2023

Hello Tom, you made my day! Without your explanations it would be pure magic... With your explanations (especially on N0-N3) I feel that I will fully understand your code as soon as I am familiar with the SQL join function. Many thanks!!! Sabine

SabineT · ‎03-24-2023

Did I scare you off!? I tried to abstract my requirement as follows: for each row do A = 1 B = 0 C = 0 D = 0 check all rows for each id together different id? yes: does id contain the same species? > yes: does id contain the same product? > yes: does id contain the same problem? > yes: A + 1 no: B + 1 no: does id contain the same problem? > yes: C + 1 no: D + 1 no: continue with the next id no: continue with the next row Who is confident that there is a solution for this in SAS? I am lost in reading threads about arrays and loops 😓

SabineT · ‎03-22-2023

Hello everyone, I hope you stay tuned 😅 Here is the optimised dataset: data raw; input id species $ product $ problem $15-25; datalines; 123 dog prod_a deafness 123 dog prod_a headshake 123 dog prod_b deafness 123 dog prod_b headshake 345 dog prod_a itching 345 dog prod_c itching 234 cat prod_c hair_loss 567 cat prod_d hair_loss 678 cat prod_e deafness 321 dog prod_a deafness 321 dog prod_a headshake 321 dog prod_c deafness 321 dog prod_c headshake 543 dog prod_a itching 543 dog prod_c itching 432 cat prod_c hair_loss 765 cat prod_d hair_loss 876 cat prod_e deafness 111 dog prod_f hair_loss 222 dog prod_g hair_loss 333 dog prod_a vomit 444 dog prod_c vomit 555 dog prod_g hair_loss 555 dog prod_h hair_loss 666 cat prod_a vomit 666 cat prod_c vomit 777 cat prod_c deafness 888 cat prod_g hair_loss 999 cat prod_e vomit 999 cat prod_e hair_loss ; run; the expected output: expected output 2 (that is telling me - among other things - that the odds for deafness with prod_e are 12fold higher than for other prod) I skipped the list of contributing IDs as it was postflooding...

SabineT · ‎03-22-2023

1. You're absolutely right - I "lost" IDs 345 and 543 in my data for counting! 2. I will add some data contributing to B. 3. I stated this because SAS currently counts each ID more than once. But if I'm not on the wrong track, each ID falls in one of the categories A, B, C or D only. >> So I will rework on my sample data and the expected output.

SabineT · ‎03-22-2023

Unfortunately it is not the output I expect. I explained it in my feedback to PaigeMiller (above). Don't want to copy it. Please see here: https://communities.sas.com/t5/SAS-Programming/Need-help-with-correct-counting-with-Proc-SQL/m-p/865715/highlight/true#M341869

SabineT · ‎03-22-2023

Sorry, I tried to keep it short - but obviously not simple... With the sample data I would expected the following output: expected output I should have included any reports with "new" symptoms to have some counts for B... 🤔 For car I counted the IDs that contain prod_c and hair_loss as 'a' for that drug event pair. It is 234 and 432, so 2 IDs. Then I look for any other ID containing prod_c but not hair_loss, but there i none (b=0). For 'c' I count those IDs that do not have prod_c but hair_loss. 567 and 765 satisfy this condition, so c = 2. 'd' is the count of IDs that contain neither prod_c nor hair_loss. Here we have 678 and 876 (d=2). ROR can't be calculated since 0 is included as denominator. I listed the expected counts in the following table: species product problem a IDs b IDs c IDs d IDs dog prod_a deafness 2 123, 321 0 0 0 dog prod_a headshake 2 123, 321 0 0 0 dog prod_b deafness 1 123 0 1 321 0 dog prod_b headshake 1 123 0 1 321 0 dog prod_a itching 1 321 0 0 0 dog prod_c deafness 1 321 0 1 123 0 dog prod_c headshake 1 321 0 1 123 0 dog prod_c itching 1 321 0 0 1 123 cat prod_c hair_loss 2 234, 432 0 2 567, 765 2 678, 876 cat prod_d hair_loss 2 567, 765 0 2 234, 432 2 678, 876 cat prod_e deafness 2 678, 876 0 0 4 234, 432, 567, 765 You see the sum of A to D should always be the sum of IDs per species. And as I describe the steps to walk through the dataset, I understand that the conditions should not be simply connected with AND in the code...

SabineT · ‎03-22-2023

Hi Sajid01, Thanks a lot for the helping step! Now the code is counting. But the result remains weired. How can D be 68 if there are only 18 observations? I'm posting the output again as it appears that the format crashed in your output: results2 As posted to PaigeMiller the problem is that I don`t want to count observations but IDs (per species) depending on the conditions product and problem. It appears that SAS checks now the rows and counts the row once for each satisfied condition. So I am afraid, there is a very relevant step missing... Do you have an idea? Regards, Sabine

SabineT · ‎03-22-2023

Hi PaigeMiller, many thanks for taking care! I even thought that my task is quite simple, but I went a lot of ways and this time the output appears close to the target. What I want to calculate is the frequency of each pair of product and problem (drug-event-pair -> DEP) per species. The ratio (ROR) should tell me if the frequency is higher than expected (>1). This means a potential signal in pharmacovigilance (= monitoring the benefit-risk profile of drugs). In the very end I am not interested in the table but only in the ROR. A is the number of DEPs of interest. B is the number of events with the same product but without the problem. C is the number of events with the same problem but without the drug of interest. And under D all events should be counted that do neither contain the event nor the drug of interest. I think the problem is that SAS checks the rows and I have to find a way that it checks all rows for one ID to assign the ID to the right field of the crosstable. I hope that helps? 🙄 Regards, Sabine

SabineT · ‎03-21-2023

Dear experts, I have a dataset containing columns for ID, species, product and problem. In the end I want to calculate the Odds Ratio for each pair of product and problem per species, meaning that I have to determine the values for a 2x2 crosstable for each row. Each ID can have multiple values for product and problem (but only one species). It is important, that each ID is counted only once in the crosstable. I'm working with SAS 9.4. Here is an example dataset: data raw; input id species $ product $ problem $15-25; datalines; 123 dog prod_a deafness 123 dog prod_a headshake 123 dog prod_b deafness 123 dog prod_b headshake 345 dog prod_a itching 345 dog prod_c itching 234 cat prod_c hair_loss 567 cat prod_d hair_loss 678 cat prod_e deafness 321 dog prod_a deafness 321 dog prod_a headshake 321 dog prod_c deafness 321 dog prod_c headshake 543 dog prod_a itching 543 dog prod_c itching 432 cat prod_c hair_loss 765 cat prod_d hair_loss 876 cat prod_e deafness ; run; After sorting by species to enable species specific analyses, I want to count the IDs for my crosstable under the following conditions: A = [IDs with the product and with the problem] B = [IDs with the product but without the problem] C = [IDs without the product but with the problem] D = [IDs without the product and without the problem] The ROR is then calculated with (A/B)/(C/D). This is my code: proc sql; select distinct raw.species, raw.product, raw.problem, (select count(id) from raw sub where sub.species = raw.species and sub.id ^= raw.id and sub.product = raw.product and sub.problem = raw.problem) as a, (select count(id) from ror sub where sub.species = raw.species and sub.id ^= raw.id and sub.product = raw.product and sub.problem ^= raw.problem) as b, (select count(id) from raw sub where sub.species = raw.species and sub.id ^= raw.id and sub.product ^= raw.product and sub.problem = raw.problem) as c, (select count(id) from raw sub where sub.species = raw.species and sub.id ^= raw.id and sub.product ^= raw.product and sub.problem ^= raw.problem) as d, (calculated a/ calculated b)/(calculated c/calculated d) as ror from raw; quit; This is the result: failed_case_counts Now I have two problems: „a“ is expected to be at least 1, but the code does not count the ID itself (that's logic, as it counts where sub.id ^= raw.id) From the preliminary results, I can see, that with this code each ID can be counted more than once (thus contributing to more than one field of the 2x2 table). This must not be the case. Is there anyone out there, who can help with the solution? I hope I made myself clear enough... Looking forward to your hints and ideas! Kind regards, SabineT

Online Status	Offline
Date Last Visited	‎04-28-2023 10:32 AM

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Re: Need help with correct counting with Proc SQL

Need help with correct counting with Proc SQL