BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
shivenb
Calcite | Level 5

Hello,

I am trying to create a new dataset (let's call it new_table) from an old dataset (called old_table) where the diagnosis variable (dx1) is equal to one of 150 diagnosis codes I have. Problem is that many of these diagnosis codes are broad with any value accepted beyond the first 3 specified letters. for example, for epilepsy, G40.xx where xx could represent any set of letters. I tried 2 ways of achieving this and both have failed. Any tweaks to either method or a totally new method would be appreciated!

Approach 1: Use wildcards within macro lists.

I have all these diagnosis values readily available where I can easily code them all by condition and combine them into a general macro list.

E.g. %let epilepsy = 'G40%', 'R56%'; %let nausea = 'R11%';

let% allicd10s = &epilepsy &nausea;

Then, I thought of using proc sql to get my new table

proc sql;

create table new_table as

select*

from old_table

where dx1 in (&allicd10s.);

quit;

This approach did not work because proc sql cannot read wild card function within macro lists.

 

Approach 2: use contains

I created a new table with all of the available diagnosis values. I named the table pci and named the variable icd10 and then used proc sql again

proc sql;

create table new_table as

select*

from old_table

where dx1 contains (select icd10 from pci);

quit;

Problem is that contains can only be used for a single string, which sounds quite a pointless feature. Are there any non-macro statement ways for me to use contains with an or function? 

I am looking for a simpler solution than a macro statement only because I intend to use this for multiple variables in many ways (creating tables, creating other variables, summary counts, etc.)

 

Any help is appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Does below solve your problem?

data want;
  set sashelp.class;
  if name in: ('Ja','Ro');
run;

Patrick_0-1691545315120.png

 

Alternatively a RegEx would work as well.

data want;
  set sashelp.class;
  if prxmatch('/^(ja|ro)/oi',strip(name))>0;
run;

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

Does below solve your problem?

data want;
  set sashelp.class;
  if name in: ('Ja','Ro');
run;

Patrick_0-1691545315120.png

 

Alternatively a RegEx would work as well.

data want;
  set sashelp.class;
  if prxmatch('/^(ja|ro)/oi',strip(name))>0;
run;
shivenb
Calcite | Level 5

Thanks for your feedback! I tried this approach with a macro, and I am surprised that not only did it work, but I think it works more accurately than the weird solution I came up with. I tried the following approach and it ended up giving me a smaller resulting sample size. I cannot figure out how the following code and your code give a different result because my code is just getting at if there is at least 1 ICD10 value where the category

 

your suggested code restated in this context:

%let icd10 = 'F10', 'Z7141', 'K70', 'D50', 'D51', 'D52', 'D53', 'D55', 'D56', 'D57', .....;

data op_fac_icd10_a1;

set op_fac_a1;

if dx1 in: (&icd10.) or dx2 in: (&icd10.) or dx3 in: (&icd10.) or dx4 in: (&icd10.) or dx5 in: (&icd10.);

run;

 

the code I ended up using is listed below where I ended up referencing ICD10 values in a variable in a table called pedcomindex:

proc sql;

create table op_fac_icd10_a1 as

select* from op_fac_a1 where (

exists (select 1 from pedcomindex where dx1 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx2 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx3 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx4 like '%'||icd10||'%')

or exists (select 1 from pedcomindex where dx5 like '%'||icd10||'%')

);

quit;

 

Any thoughts? Thanks so much again for your response!

ballardw
Super User

Approach 3) test if the value is IN a list of desired values.

In this case extract the first 3 characters (the vast majority of ICD10 codes before the decimal when present). This where would work in SQL as well

data example;
   input code $;
datalines;
G40.23
A40.bc
C18.1
R56.99
A56.9
R56
B11
Q11.89
R11
;

data want;
  set example;
  where substr(code,1,3) in ('G40' 'R56' 'R11');
run;

Another approach if your shop has a complete enough custom format (or possibly formats) for ICD-10 codes is to use the Formatted value in a comparison.

Dummy to show what that might look like:

data dummy;
   set have;
   where put(dx1,$mycustomformat.) in ('Nausea' 'Epilepsy');
run;

But not many shops have complete formats and you would need to use your format name(s) and the formatted values created by them.

 

 

shivenb
Calcite | Level 5

Thanks! The problem with this approach is that the ICD10 list I have to match on is not always consistent in the characters that are present within it. E.g., There is a code Z87891, but then there are many codes like F23, F0633, E40. So it is all over the place, and I need to make sure that the existing table is filtered with the exact NDC value within this reference sheet I have.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1011 views
  • 1 like
  • 3 in conversation