Solved: Re: Re-coding/ matching multiple columns from an external table

ammarhm · Posted 12-11-2022 12:16 AM

Hi everyone,

I am trying to recode columns in a table using references from another table.

The first table, which basically contains coded diagnoses looks as follows:

data Dx;
input Person_ID Dx1_Ref Dx2_ref Dx3_Ref Dx4_Ref AdmissionSource	DischargeDestination;
datalines;
1	5	1	6	4	10	10
2	2	9	2	8	10	20
3	8	2	4	5	20	30
4	3	6	6	2	30	10
5	8	8	9	7	10	30
6	9	2	2	9	30	20
7	2	6	4	3	10	20
8	4	4	5	6	20	10
;
run;

The second table which contains references to what the codes in the different columns above mean is this:

data Code;
length Column $20 Text $15;
input Column $ Ref Text $;
datalines;
Dx1_Ref	1	HT
Dx1_Ref	2	HC
Dx1_Ref	3	DM
Dx1_Ref	4	IHD
Dx1_Ref	5	IB
Dx1_Ref	6	CL
Dx1_Ref	7	HF
Dx1_Ref	8	DI
Dx1_Ref	9	HG
Dx2_Ref	1	HT
Dx2_Ref	2	HC
Dx2_Ref	3	DM
Dx2_Ref	4	IHD
Dx2_Ref	5	IB
Dx2_Ref	6	CL
Dx2_Ref	7	HF
Dx2_Ref	8	DI
Dx2_Ref	9	HG
AdmissionSource	10	Home
AdmissionSource	20	OtherHospital
AdmissionSource	30	NursingHome
DischargeDestination	10	Home
DischargeDestination	20	OtherHospital
DischargeDestination	30	NursingHome
run;

So I am generating a 'de-coded' table by extracting the text from the second table and matching it to the first one by column name and the value in the column:

proc sql;
create table Dx1 as
select
a.*,
b.Text as Dx1_txt
from Dx as a
left join Code as b
on 
a.Dx1_Ref=b.Ref
and b.Column='Dx1_Ref' /*this is not very essential because 'Ref' in table Code is unique to the columns but would be good to be able to add*/
;
quit;

And then I am repeating this process to get the text values for the remaining columns one by one:

proc sql;
create table Dx2 as
select
a.*,
b.Text as Dx2_txt
from Dx1 as a
left join Code as b
on 
a.Dx2_Ref=b.Ref
and b.Column='Dx2_Ref' /*this is not very essential because 'Ref' in table Code is unique to the columns but would be good to be able to add*/
;
quit;

and so forth for Dx3_ref, Dx4_ref, AdmissionSource, Discharge destination and so forth.

This works just fine, but the actual tables and much bigger with many more columns, and it does not feel like an efficient way to solve this problem.

I was just wondering if anyone has any other suggestion to maybe re-code all columns at once, but matching on Ref and (ideally) column name?

Thank you

AM

Tom · Posted 12-11-2022 12:09 PM

Just create formats. Your CODE dataset is already setup perfectly for creating them.

data Code;
  length Column $20 Ref 8 Text $15;
  input Column Ref Text ;
datalines;
Dx1_Ref 1 HT
Dx1_Ref 2 HC
Dx1_Ref 3 DM
Dx1_Ref 4 IHD
Dx1_Ref 5 IB
Dx1_Ref 6 CL
Dx1_Ref 7 HF
Dx1_Ref 8 DI
Dx1_Ref 9 HG
Dx2_Ref 1 HT
Dx2_Ref 2 HC
Dx2_Ref 3 DM
Dx2_Ref 4 IHD
Dx2_Ref 5 IB
Dx2_Ref 6 CL
Dx2_Ref 7 HF
Dx2_Ref 8 DI
Dx2_Ref 9 HG
AdmissionSource 10 Home
AdmissionSource 20 OtherHospital
AdmissionSource 30 NursingHome
DischargeDestination 10 Home
DischargeDestination 20 OtherHospital
DischargeDestination 30 NursingHome
;

proc format cntlin=code(rename=(column=fmtname ref=start text=label));
run;

Then you can just attach the formats to the variables and SAS will use them to display the values.

proc print ;
  format
  Dx1_Ref Dx2_ref Dx3_Ref Dx4_Ref dx1_ref.
  AdmissionSource AdmissionSource. 
  DischargeDestination DischargeDestination.
  ;
run;

View solution in original post

Kurt_Bremser · Posted 12-11-2022 01:35 AM

Create formats from your second table, and assign them to the coded variables. Since it seems that the diag codes repeat, a single format for them should suffice. This will not increase the dataset size in any way, as format references are stored in an already reserved place in the dataset header page.

Do you always have 4 diagnosis codes (never more, never less)? I have a feeling that separating the diags into a vertical dataset with a single diagnosis column may be more efficient with regards to future analysis (Maxim 19).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ger15xxhcker · Posted 12-11-2022 02:24 AM

A good way to solve this problem is to use SAS Proc Transpose. Proc Transpose can take a dataset that has multiple identical variables and convert it into a dataset with one variable with the variable name being the variable values. This can be done with the following code:

PROC TRANSPOSE DATA=Dx OUT=Dx1;
BY Person_ID;
VAR Dx1_Ref Dx2_Ref Dx3_Ref Dx4_Ref AdmissionSource DischargeDestination;
RUN;

This will generate a dataset containing one variable for each column, with the variable names being the coded values from the Dx dataset.
You can then use a data step to create a merge between the Code dataset and the Dx1 dataset to get the text descriptions for the coded values.

DATA Dx1;
MERGE Dx1 (IN = A) Code (IN = B);
BY Ref;
IF A;
Dx1_txt = Text;
RUN;

This will create a new dataset with one variable (Dx1_txt) containing the text descriptions for the four diagnosis columns.
For the remaining columns, you can use the same method by transposing the remaining columns in the Dx dataset and then merging with the Code dataset to get the appropriate text description.

Tom · Posted 12-11-2022 12:09 PM

Just create formats. Your CODE dataset is already setup perfectly for creating them.

data Code;
  length Column $20 Ref 8 Text $15;
  input Column Ref Text ;
datalines;
Dx1_Ref 1 HT
Dx1_Ref 2 HC
Dx1_Ref 3 DM
Dx1_Ref 4 IHD
Dx1_Ref 5 IB
Dx1_Ref 6 CL
Dx1_Ref 7 HF
Dx1_Ref 8 DI
Dx1_Ref 9 HG
Dx2_Ref 1 HT
Dx2_Ref 2 HC
Dx2_Ref 3 DM
Dx2_Ref 4 IHD
Dx2_Ref 5 IB
Dx2_Ref 6 CL
Dx2_Ref 7 HF
Dx2_Ref 8 DI
Dx2_Ref 9 HG
AdmissionSource 10 Home
AdmissionSource 20 OtherHospital
AdmissionSource 30 NursingHome
DischargeDestination 10 Home
DischargeDestination 20 OtherHospital
DischargeDestination 30 NursingHome
;

proc format cntlin=code(rename=(column=fmtname ref=start text=label));
run;

Then you can just attach the formats to the variables and SAS will use them to display the values.

proc print ;
  format
  Dx1_Ref Dx2_ref Dx3_Ref Dx4_Ref dx1_ref.
  AdmissionSource AdmissionSource. 
  DischargeDestination DischargeDestination.
  ;
run;

Re-coding/ matching multiple columns from an external table

Re: Re-coding/ matching multiple columns from an external table

Re: Re-coding/ matching multiple columns from an external table

Re: Re-coding/ matching multiple columns from an external table

Re: Re-coding/ matching multiple columns from an external table

Register Today!

SAS Training: Just a Click Away