BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shivenb
Calcite | Level 5

Hello,

I am trying to create a new dataset (let's call it new_table) from an old dataset (called old_table) where the diagnosis variable (dx1) is equal to one of 150 diagnosis codes I have. Problem is that many of these diagnosis codes are broad with any value accepted beyond the first 3 specified letters. for example, for epilepsy, G40.xx where xx could represent any set of letters. I tried 2 ways of achieving this and both have failed. Any tweaks to either method or a totally new method would be appreciated!

Approach 1: Use wildcards within macro lists.

I have all these diagnosis values readily available where I can easily code them all by condition and combine them into a general macro list.

E.g. %let epilepsy = 'G40%', 'R56%'; %let nausea = 'R11%';

let% allicd10s = &epilepsy &nausea;

Then, I thought of using proc sql to get my new table

proc sql;

create table new_table as

select*

from old_table

where dx1 in (&allicd10s.);

quit;

This approach did not work because proc sql cannot read wild card function within macro lists.

 

Approach 2: use contains

I created a new table with all of the available diagnosis values. I named the table pci and named the variable icd10 and then used proc sql again

proc sql;

create table new_table as

select*

from old_table

where dx1 contains (select icd10 from pci);

quit;

Problem is that contains can only be used for a single string, which sounds quite a pointless feature. Are there any non-macro statement ways for me to use contains with an or function? 

I am looking for a simpler solution than a macro statement only because I intend to use this for multiple variables in many ways (creating tables, creating other variables, summary counts, etc.)

 

Any help is appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Does below solve your problem?

data want;
  set sashelp.class;
  if name in: ('Ja','Ro');
run;

Patrick_0-1691545315120.png

 

Alternatively a RegEx would work as well.

data want;
  set sashelp.class;
  if prxmatch('/^(ja|ro)/oi',strip(name))>0;
run;

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Does below solve your problem?

data want;
  set sashelp.class;
  if name in: ('Ja','Ro');
run;

Patrick_0-1691545315120.png

 

Alternatively a RegEx would work as well.

data want;
  set sashelp.class;
  if prxmatch('/^(ja|ro)/oi',strip(name))>0;
run;
shivenb
Calcite | Level 5

Thanks for your feedback! I tried this approach with a macro, and I am surprised that not only did it work, but I think it works more accurately than the weird solution I came up with. I tried the following approach and it ended up giving me a smaller resulting sample size. I cannot figure out how the following code and your code give a different result because my code is just getting at if there is at least 1 ICD10 value where the category

 

your suggested code restated in this context:

%let icd10 = 'F10', 'Z7141', 'K70', 'D50', 'D51', 'D52', 'D53', 'D55', 'D56', 'D57', .....;

data op_fac_icd10_a1;

set op_fac_a1;

if dx1 in: (&icd10.) or dx2 in: (&icd10.) or dx3 in: (&icd10.) or dx4 in: (&icd10.) or dx5 in: (&icd10.);

run;

 

the code I ended up using is listed below where I ended up referencing ICD10 values in a variable in a table called pedcomindex:

proc sql;

create table op_fac_icd10_a1 as

select* from op_fac_a1 where (

exists (select 1 from pedcomindex where dx1 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx2 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx3 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx4 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx5 like '%'||icd10||'%')

);

quit;

 

Any thoughts? Thanks so much again for your response!

ballardw
Super User

Approach 3) test if the value is IN a list of desired values.

In this case extract the first 3 characters (the vast majority of ICD10 codes before the decimal when present). This where would work in SQL as well

data example;
   input code $;
datalines;
G40.23
A40.bc
C18.1
R56.99
A56.9
R56
B11
Q11.89
R11
;

data want;
  set example;
  where substr(code,1,3) in ('G40' 'R56' 'R11');
run;

Another approach if your shop has a complete enough custom format (or possibly formats) for ICD-10 codes is to use the Formatted value in a comparison.

Dummy to show what that might look like:

data dummy;
   set have;
   where put(dx1,$mycustomformat.) in ('Nausea' 'Epilepsy');
run;

But not many shops have complete formats and you would need to use your format name(s) and the formatted values created by them.

 

 

shivenb
Calcite | Level 5

Thanks! The problem with this approach is that the ICD10 list I have to match on is not always consistent in the characters that are present within it. E.g., There is a code Z87891, but then there are many codes like F23, F0633, E40. So it is all over the place, and I need to make sure that the existing table is filtered with the exact NDC value within this reference sheet I have.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3169 views
  • 1 like
  • 3 in conversation