DATA Step, Macro, Functions and more

Extract value from a column based on other columns

Accepted Solution Solved
Reply
Regular Contributor
Posts: 182
Accepted Solution

Extract value from a column based on other columns

A quick question

I am tying to join 2 tables using:

The first table is ICD10s:

data ICD10;
input Group:$ L1:$ L2:$ L3:$ ;
datalines;
1 F70 E16 G12
2 F71 E17 .
3 F72 E18 H4
4 F73 E19 .
5 F74 E76 .

run;

The second table has patient names and their ICD10 code

What I am trying to achieve is to get the group number from the table above based on the matched ICD codes

 

Say that a patient has an ICD code of E17, then I need the combined table to contain the patient's name and group "2" extracted from the table above

If the patient has ICD of G12 then the column should have the group number 1

 

How could this be achieved please? The problem here is that I dont know beforehand how many rows and columns the ICD table above will contain, so it needs to be a "generic" code...

Apologies if I am not making sense here!

 


Accepted Solutions
Solution
‎04-18-2018 06:39 PM
Super User
Posts: 6,939

Re: Extract value from a column based on other columns

We really have to back up here.  How did you know that your INPUT statement should input L3 but shouldn't try to input L4?

 

If you are certain the ICD10 data set will contain only variables named GROUP, plus L1, L2, L3, ..., Ln, it's an easy change to the existing code.

 

First, the ARRAY statement becomes:

 

array icds {*} L:;

 

Secondly, the DO loop becomes:

 

do k=1 to dim(icds);

View solution in original post


All Replies
PROC Star
Posts: 1,410

Re: Extract value from a column based on other columns

It almost makes sense to me Smiley Wink

 

Please post an example og´f the other table and what you want the result to look like.

Super User
Posts: 10,623

Re: Extract value from a column based on other columns

Transpose your icd10 dataset, so you can easily join:

data ICD10;
input Group:$ L1:$ L2:$ L3:$ ;
datalines;
1 F70 E16 G12
2 F71 E17 .
3 F72 E18 H4
4 F73 E19 .
5 F74 E76 .
;
run;

proc transpose
  data=icd10
  out=icd_trans (
    drop=_name_
    rename=(col1=icd10)
    where=(icd10 ne ' ')
  )
;
by group;
var l:;
run;

proc sort data=icd_trans;
by icd10;
run;

data want;
merge
  icd_trans (in=a)
  patients (in=b)
;
by icd10;
run;

(assuming that patients has the required variables and is sorted accordingly)

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 6,939

Re: Extract value from a column based on other columns

I would recommend creating a format from your ICD10 data set.  That way, you won't need to sort your patient data.  In fact, patient data often contains multiple ICD10 codes per observation and wouldn't be suitable for sorting without jumping through major hoops. At any rate:

 

data reshape;

set icd10 end=done;

retain fmtname '$group';

array icds {3} L1-L3;

label = group;

do k=1 to 3;

   if icds{k} > ' ' then do;

      start = icds{k};

      output;

   end;

end;

if done;

label = ' ';

hlo='O';

output;

run;

 

You do need to know the maximum number of codes that can appear in one group, since that is hard-coded here at 3.  That piece can be automated but it takes a little work to do that.  This gives you a data set that PROC FORMAT can consume, creating a format named $group:

 

proc format cntlin=reshape;

run;

 

All that is left to do is apply that format:

 

data want;

set have;

group = put(ICD10, $group.);

run;

 

Regular Contributor
Posts: 182

Re: Extract value from a column based on other columns

Posted in reply to Astounding

Thank you @Astounding

I have to say I am very impressed by the solution you suggested using proc format. I like the simplicity yet lateral thinking approach here.

Could you please elaborate on how to automate the maximun number of ICD columns (in the example i gave there were 3 columns and you created and array: array icds {3} L1-L3Smiley Wink is there a way to expand this code so that it could account for any number of columns in the ICD10 table I had at the beginning of the example?

Super User
Posts: 10,623

Re: Extract value from a column based on other columns


@ammarhm wrote:

Thank you @Astounding


 is there a way to expand this code so that it could account for any number of columns in the ICD10 table I had at the beginning of the example?

See my proc transpose example. It takes care of an arbitrary number of columns, as long as they start with an L. It's always better to work with a long vs. a wide dataset layout.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Solution
‎04-18-2018 06:39 PM
Super User
Posts: 6,939

Re: Extract value from a column based on other columns

We really have to back up here.  How did you know that your INPUT statement should input L3 but shouldn't try to input L4?

 

If you are certain the ICD10 data set will contain only variables named GROUP, plus L1, L2, L3, ..., Ln, it's an easy change to the existing code.

 

First, the ARRAY statement becomes:

 

array icds {*} L:;

 

Secondly, the DO loop becomes:

 

do k=1 to dim(icds);

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 140 views
  • 4 likes
  • 4 in conversation