Hello,
I am trying to create a new dataset (let's call it new_table) from an old dataset (called old_table) where the diagnosis variable (dx1) is equal to one of 150 diagnosis codes I have. Problem is that many of these diagnosis codes are broad with any value accepted beyond the first 3 specified letters. for example, for epilepsy, G40.xx where xx could represent any set of letters. I tried 2 ways of achieving this and both have failed. Any tweaks to either method or a totally new method would be appreciated!
Approach 1: Use wildcards within macro lists.
I have all these diagnosis values readily available where I can easily code them all by condition and combine them into a general macro list.
E.g. %let epilepsy = 'G40%', 'R56%'; %let nausea = 'R11%';
let% allicd10s = &epilepsy &nausea;
Then, I thought of using proc sql to get my new table
proc sql;
create table new_table as
select*
from old_table
where dx1 in (&allicd10s.);
quit;
This approach did not work because proc sql cannot read wild card function within macro lists.
Approach 2: use contains
I created a new table with all of the available diagnosis values. I named the table pci and named the variable icd10 and then used proc sql again
proc sql;
create table new_table as
select*
from old_table
where dx1 contains (select icd10 from pci);
quit;
Problem is that contains can only be used for a single string, which sounds quite a pointless feature. Are there any non-macro statement ways for me to use contains with an or function?
I am looking for a simpler solution than a macro statement only because I intend to use this for multiple variables in many ways (creating tables, creating other variables, summary counts, etc.)
Any help is appreciated!
Does below solve your problem?
data want;
set sashelp.class;
if name in: ('Ja','Ro');
run;
Alternatively a RegEx would work as well.
data want;
set sashelp.class;
if prxmatch('/^(ja|ro)/oi',strip(name))>0;
run;
Does below solve your problem?
data want;
set sashelp.class;
if name in: ('Ja','Ro');
run;
Alternatively a RegEx would work as well.
data want;
set sashelp.class;
if prxmatch('/^(ja|ro)/oi',strip(name))>0;
run;
Thanks for your feedback! I tried this approach with a macro, and I am surprised that not only did it work, but I think it works more accurately than the weird solution I came up with. I tried the following approach and it ended up giving me a smaller resulting sample size. I cannot figure out how the following code and your code give a different result because my code is just getting at if there is at least 1 ICD10 value where the category
your suggested code restated in this context:
%let icd10 = 'F10', 'Z7141', 'K70', 'D50', 'D51', 'D52', 'D53', 'D55', 'D56', 'D57', .....;
data op_fac_icd10_a1;
set op_fac_a1;
if dx1 in: (&icd10.) or dx2 in: (&icd10.) or dx3 in: (&icd10.) or dx4 in: (&icd10.) or dx5 in: (&icd10.);
run;
the code I ended up using is listed below where I ended up referencing ICD10 values in a variable in a table called pedcomindex:
proc sql;
create table op_fac_icd10_a1 as
select* from op_fac_a1 where (
exists (select 1 from pedcomindex where dx1 like '%'||icd10||'%')
or exists (select 1 from pedcomindex where dx2 like '%'||icd10||'%')
or exists (select 1 from pedcomindex where dx3 like '%'||icd10||'%')
or exists (select 1 from pedcomindex where dx4 like '%'||icd10||'%')
or exists (select 1 from pedcomindex where dx5 like '%'||icd10||'%')
);
quit;
Any thoughts? Thanks so much again for your response!
Approach 3) test if the value is IN a list of desired values.
In this case extract the first 3 characters (the vast majority of ICD10 codes before the decimal when present). This where would work in SQL as well
data example; input code $; datalines; G40.23 A40.bc C18.1 R56.99 A56.9 R56 B11 Q11.89 R11 ; data want; set example; where substr(code,1,3) in ('G40' 'R56' 'R11'); run;
Another approach if your shop has a complete enough custom format (or possibly formats) for ICD-10 codes is to use the Formatted value in a comparison.
Dummy to show what that might look like:
data dummy; set have; where put(dx1,$mycustomformat.) in ('Nausea' 'Epilepsy'); run;
But not many shops have complete formats and you would need to use your format name(s) and the formatted values created by them.
Thanks! The problem with this approach is that the ICD10 list I have to match on is not always consistent in the characters that are present within it. E.g., There is a code Z87891, but then there are many codes like F23, F0633, E40. So it is all over the place, and I need to make sure that the existing table is filtered with the exact NDC value within this reference sheet I have.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.