BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ammarhm
Lapis Lazuli | Level 10

A quick question

I am tying to join 2 tables using:

The first table is ICD10s:

data ICD10;
input Group:$ L1:$ L2:$ L3:$ ;
datalines;
1 F70 E16 G12
2 F71 E17 .
3 F72 E18 H4
4 F73 E19 .
5 F74 E76 .

run;

The second table has patient names and their ICD10 code

What I am trying to achieve is to get the group number from the table above based on the matched ICD codes

 

Say that a patient has an ICD code of E17, then I need the combined table to contain the patient's name and group "2" extracted from the table above

If the patient has ICD of G12 then the column should have the group number 1

 

How could this be achieved please? The problem here is that I dont know beforehand how many rows and columns the ICD table above will contain, so it needs to be a "generic" code...

Apologies if I am not making sense here!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

We really have to back up here.  How did you know that your INPUT statement should input L3 but shouldn't try to input L4?

 

If you are certain the ICD10 data set will contain only variables named GROUP, plus L1, L2, L3, ..., Ln, it's an easy change to the existing code.

 

First, the ARRAY statement becomes:

 

array icds {*} L:;

 

Secondly, the DO loop becomes:

 

do k=1 to dim(icds);

View solution in original post

6 REPLIES 6
PeterClemmensen
Tourmaline | Level 20

It almost makes sense to me 😉

 

Please post an example og´f the other table and what you want the result to look like.

Kurt_Bremser
Super User

Transpose your icd10 dataset, so you can easily join:

data ICD10;
input Group:$ L1:$ L2:$ L3:$ ;
datalines;
1 F70 E16 G12
2 F71 E17 .
3 F72 E18 H4
4 F73 E19 .
5 F74 E76 .
;
run;

proc transpose
  data=icd10
  out=icd_trans (
    drop=_name_
    rename=(col1=icd10)
    where=(icd10 ne ' ')
  )
;
by group;
var l:;
run;

proc sort data=icd_trans;
by icd10;
run;

data want;
merge
  icd_trans (in=a)
  patients (in=b)
;
by icd10;
run;

(assuming that patients has the required variables and is sorted accordingly)

Astounding
PROC Star

I would recommend creating a format from your ICD10 data set.  That way, you won't need to sort your patient data.  In fact, patient data often contains multiple ICD10 codes per observation and wouldn't be suitable for sorting without jumping through major hoops. At any rate:

 

data reshape;

set icd10 end=done;

retain fmtname '$group';

array icds {3} L1-L3;

label = group;

do k=1 to 3;

   if icds{k} > ' ' then do;

      start = icds{k};

      output;

   end;

end;

if done;

label = ' ';

hlo='O';

output;

run;

 

You do need to know the maximum number of codes that can appear in one group, since that is hard-coded here at 3.  That piece can be automated but it takes a little work to do that.  This gives you a data set that PROC FORMAT can consume, creating a format named $group:

 

proc format cntlin=reshape;

run;

 

All that is left to do is apply that format:

 

data want;

set have;

group = put(ICD10, $group.);

run;

 

ammarhm
Lapis Lazuli | Level 10

Thank you @Astounding

I have to say I am very impressed by the solution you suggested using proc format. I like the simplicity yet lateral thinking approach here.

Could you please elaborate on how to automate the maximun number of ICD columns (in the example i gave there were 3 columns and you created and array: array icds {3} L1-L3;) is there a way to expand this code so that it could account for any number of columns in the ICD10 table I had at the beginning of the example?

Kurt_Bremser
Super User

@ammarhm wrote:

Thank you @Astounding


 is there a way to expand this code so that it could account for any number of columns in the ICD10 table I had at the beginning of the example?

See my proc transpose example. It takes care of an arbitrary number of columns, as long as they start with an L. It's always better to work with a long vs. a wide dataset layout.

Astounding
PROC Star

We really have to back up here.  How did you know that your INPUT statement should input L3 but shouldn't try to input L4?

 

If you are certain the ICD10 data set will contain only variables named GROUP, plus L1, L2, L3, ..., Ln, it's an easy change to the existing code.

 

First, the ARRAY statement becomes:

 

array icds {*} L:;

 

Secondly, the DO loop becomes:

 

do k=1 to dim(icds);

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2805 views
  • 4 likes
  • 4 in conversation