Hi everyone,
I am trying to recode columns in a table using references from another table.
The first table, which basically contains coded diagnoses looks as follows:
data Dx;
input Person_ID Dx1_Ref Dx2_ref Dx3_Ref Dx4_Ref AdmissionSource DischargeDestination;
datalines;
1 5 1 6 4 10 10
2 2 9 2 8 10 20
3 8 2 4 5 20 30
4 3 6 6 2 30 10
5 8 8 9 7 10 30
6 9 2 2 9 30 20
7 2 6 4 3 10 20
8 4 4 5 6 20 10
;
run;
The second table which contains references to what the codes in the different columns above mean is this:
data Code; length Column $20 Text $15; input Column $ Ref Text $; datalines; Dx1_Ref 1 HT Dx1_Ref 2 HC Dx1_Ref 3 DM Dx1_Ref 4 IHD Dx1_Ref 5 IB Dx1_Ref 6 CL Dx1_Ref 7 HF Dx1_Ref 8 DI Dx1_Ref 9 HG Dx2_Ref 1 HT Dx2_Ref 2 HC Dx2_Ref 3 DM Dx2_Ref 4 IHD Dx2_Ref 5 IB Dx2_Ref 6 CL Dx2_Ref 7 HF Dx2_Ref 8 DI Dx2_Ref 9 HG AdmissionSource 10 Home AdmissionSource 20 OtherHospital AdmissionSource 30 NursingHome DischargeDestination 10 Home DischargeDestination 20 OtherHospital DischargeDestination 30 NursingHome run;
So I am generating a 'de-coded' table by extracting the text from the second table and matching it to the first one by column name and the value in the column:
proc sql;
create table Dx1 as
select
a.*,
b.Text as Dx1_txt
from Dx as a
left join Code as b
on
a.Dx1_Ref=b.Ref
and b.Column='Dx1_Ref' /*this is not very essential because 'Ref' in table Code is unique to the columns but would be good to be able to add*/
;
quit;
And then I am repeating this process to get the text values for the remaining columns one by one:
proc sql;
create table Dx2 as
select
a.*,
b.Text as Dx2_txt
from Dx1 as a
left join Code as b
on
a.Dx2_Ref=b.Ref
and b.Column='Dx2_Ref' /*this is not very essential because 'Ref' in table Code is unique to the columns but would be good to be able to add*/
;
quit;
and so forth for Dx3_ref, Dx4_ref, AdmissionSource, Discharge destination and so forth.
This works just fine, but the actual tables and much bigger with many more columns, and it does not feel like an efficient way to solve this problem.
I was just wondering if anyone has any other suggestion to maybe re-code all columns at once, but matching on Ref and (ideally) column name?
Thank you
AM
Just create formats. Your CODE dataset is already setup perfectly for creating them.
data Code;
length Column $20 Ref 8 Text $15;
input Column Ref Text ;
datalines;
Dx1_Ref 1 HT
Dx1_Ref 2 HC
Dx1_Ref 3 DM
Dx1_Ref 4 IHD
Dx1_Ref 5 IB
Dx1_Ref 6 CL
Dx1_Ref 7 HF
Dx1_Ref 8 DI
Dx1_Ref 9 HG
Dx2_Ref 1 HT
Dx2_Ref 2 HC
Dx2_Ref 3 DM
Dx2_Ref 4 IHD
Dx2_Ref 5 IB
Dx2_Ref 6 CL
Dx2_Ref 7 HF
Dx2_Ref 8 DI
Dx2_Ref 9 HG
AdmissionSource 10 Home
AdmissionSource 20 OtherHospital
AdmissionSource 30 NursingHome
DischargeDestination 10 Home
DischargeDestination 20 OtherHospital
DischargeDestination 30 NursingHome
;
proc format cntlin=code(rename=(column=fmtname ref=start text=label));
run;
Then you can just attach the formats to the variables and SAS will use them to display the values.
proc print ;
format
Dx1_Ref Dx2_ref Dx3_Ref Dx4_Ref dx1_ref.
AdmissionSource AdmissionSource.
DischargeDestination DischargeDestination.
;
run;
Create formats from your second table, and assign them to the coded variables. Since it seems that the diag codes repeat, a single format for them should suffice. This will not increase the dataset size in any way, as format references are stored in an already reserved place in the dataset header page.
Do you always have 4 diagnosis codes (never more, never less)? I have a feeling that separating the diags into a vertical dataset with a single diagnosis column may be more efficient with regards to future analysis (Maxim 19).
A good way to solve this problem is to use SAS Proc Transpose. Proc Transpose can take a dataset that has multiple identical variables and convert it into a dataset with one variable with the variable name being the variable values. This can be done with the following code:
PROC TRANSPOSE DATA=Dx OUT=Dx1;
BY Person_ID;
VAR Dx1_Ref Dx2_Ref Dx3_Ref Dx4_Ref AdmissionSource DischargeDestination;
RUN;
This will generate a dataset containing one variable for each column, with the variable names being the coded values from the Dx dataset.
You can then use a data step to create a merge between the Code dataset and the Dx1 dataset to get the text descriptions for the coded values.
DATA Dx1;
MERGE Dx1 (IN = A) Code (IN = B);
BY Ref;
IF A;
Dx1_txt = Text;
RUN;
This will create a new dataset with one variable (Dx1_txt) containing the text descriptions for the four diagnosis columns.
For the remaining columns, you can use the same method by transposing the remaining columns in the Dx dataset and then merging with the Code dataset to get the appropriate text description.
Just create formats. Your CODE dataset is already setup perfectly for creating them.
data Code;
length Column $20 Ref 8 Text $15;
input Column Ref Text ;
datalines;
Dx1_Ref 1 HT
Dx1_Ref 2 HC
Dx1_Ref 3 DM
Dx1_Ref 4 IHD
Dx1_Ref 5 IB
Dx1_Ref 6 CL
Dx1_Ref 7 HF
Dx1_Ref 8 DI
Dx1_Ref 9 HG
Dx2_Ref 1 HT
Dx2_Ref 2 HC
Dx2_Ref 3 DM
Dx2_Ref 4 IHD
Dx2_Ref 5 IB
Dx2_Ref 6 CL
Dx2_Ref 7 HF
Dx2_Ref 8 DI
Dx2_Ref 9 HG
AdmissionSource 10 Home
AdmissionSource 20 OtherHospital
AdmissionSource 30 NursingHome
DischargeDestination 10 Home
DischargeDestination 20 OtherHospital
DischargeDestination 30 NursingHome
;
proc format cntlin=code(rename=(column=fmtname ref=start text=label));
run;
Then you can just attach the formats to the variables and SAS will use them to display the values.
proc print ;
format
Dx1_Ref Dx2_ref Dx3_Ref Dx4_Ref dx1_ref.
AdmissionSource AdmissionSource.
DischargeDestination DischargeDestination.
;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.