Re: Using sql to find at least one record

CathyVI · Posted 07-31-2019 09:34 AM

I am trying to find the “number and percentage of distribution of encounter with at least one medication record by data source”. This is my sample data and my sql code but I am not convinced, any help/suggestion please, Thanks.

Source	Hosp_id	Encounter_id	Medication_code	Medication_name
ccd	1123	3014759650x	1234	aspirin
ccd	1123	3014759650x	5678	ibuprofen
ehr	1123	3014751234x	9876	Tylenol
ehr	1222	3014751234x	1234	aspirin
ehr	1222	3014751234x	9876	Tylenol

For my code, I said medi_name =1, when medication_name is present, else medi_name=0

medi_code =1, when medication_code is present, else medi_code=0.

proc sql;

create table tm.testrun as

select source, count(source)as s_sum, encounter_id, count(encounter_id)as e_sum

from Tm.med

where medi_name = 1 and medi_code =1;

quit;

proc freq data= Tm.testrun;

tables source;

run;

Kurt_Bremser · Posted 07-31-2019 09:47 AM

Please supply example data in usable way (as @PeterClemmensen showed you in https://communities.sas.com/t5/SAS-Programming/Help-with-Summary-Analysis/m-p/564236#M158244). Then also supply the code that created medi_name and medi_code, just to be sure.

And finally, show what you expect as output from the example data as posted.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

CathyVI · Posted 07-31-2019 10:53 AM

Please see data in useable way;

data test;

input source $ 1-3 Hosp_id 5-8 encounter_id $ 10-20 medication_code $ 22-25 medication_name $ 26-35;

datalines;

ccd 1123 3014759650x 1234 aspirin

ehr 1123 3014759650x 5678 ibuprofen

ehr 1123 3014751234x 9876 tylenol

ehr 1222 3014751234x 1234 aspirin

ccd 1222 3014751234x 9876 tylenol

;

run;

this is the code I used to create medi_name and medi_code;

data Tmp.drug(compress=Yes);

set t.test keep=source hosp_id encounter_id medication_code medication_name);

if medication_code = "" then medi_code = 0;

else medi_code = 1;

if medication_name = "" then medi_name = 0;

else medi_name= 1;

run;

Kurt_Bremser · Posted 08-01-2019 02:31 AM

I have combined all the code you provided into one piece of working code:

data test;
input source $ 1-3 Hosp_id 5-8 encounter_id $ 10-20 medication_code $ 22-25 medication_name $ 26-35;
datalines;
ccd 1123 3014759650x 1234 aspirin
ehr 1123 3014759650x 5678 ibuprofen
ehr 1123 3014751234x 9876 tylenol
ehr 1222 3014751234x 1234 aspirin
ccd 1222 3014751234x 9876 tylenol
;

data drug;
set test (keep=source hosp_id encounter_id  medication_code medication_name);
medi_code = (medication_code > "");
medi_name = (medication_name > "");
run;

proc sql;
create table testrun as
select source, count(source)as s_sum, encounter_id, count(encounter_id)as e_sum
from drug
where medi_name = 1 and medi_code =1;
quit;

Note that

I simplified the creation of medi_code and medi_name, but the result is the same
all observations have a medi_name and medi_code of 1. You may need to overthink the logic for building those.
the SQL does a remerge, because there is no group by clause, so the sums are all the same

You did not tell us what you expect to get out of your example dataset. Given its small size, it should not be difficult to create this "by hand".

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

gamotte · Posted 07-31-2019 09:53 AM

Hello,

Not tested as you did not provide data in a usable form :

proc sql;
    create table tm.testrun as
    select distinct source, count(source)as s_sum, encounter_id, count(encounter_id)as e_sum
    from Tm.med
    group by source
    having sum(Medication_code ne " ")>0 and sum(Medication_name ne " ")>0;
quit;

CathyVI · Posted 08-01-2019 08:46 AM

I tested your code and I got the expected result, Thank you!

So I have two questions.

1. Can I add another variable to the initial code you answered. I mean if I want to "Number and percent encounters with atleast One Med record by setting_id and source"

2. Can I used the same code to find another variable in the same dataset. See code below.

Am looking for "Number and percentage of encounters by "medication_code" and "setting_id".

I added medication_code to the code you created but I removed the "distinct" since am not looking for atleast one record.

proc sql;

create table tm.testrun as

select source, setting_id, count(medication_code)as med_sumv, medication_code, encounter_id, count(encounter_id)as e_sum

from Tm.med

group by medication_code, setting_id

having sum(Medication_code ne " ")>0 and sum(Medication_name ne " ")>0;

quit;

proc freq data= Tm.testrun;

tables medication_code*setting_id/crosslist;

run;

Using sql to find at least one record