Using SAS 9.4
I am trying to get all codes unique by ID and Date into the same row. I have data below to show but basically an ID can have multiple ICD9, ICD10 and/or CPT codes per date. I would like an efficient way to transform the data from long to wide but I need it only to be wide where the ID and Date are the same. As I show below, if there are 2 rows for an ID with the same Date I would like to take the different ICD9/ICD10/CPT code from the row below and move it into a new column so that each unique ID and Date has its own row with all ICD9, ICD10 and CPT listed in wide format.
I have the following data:
data have;
input ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code Chg_Procedure_Code;
datalines;
1 09222020 1111 2222 3333
1 09222020 1111 2222 4444
2 09222020 1111 2222 3333
2 08222020 1111 2222 4444
;
run;
I want my data to look like this (As you can see if the ID is the same but the Date is different I do not want the data to transform to wide):
data want;
input ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code Chg_Procedure_Code Chg_Procedure_Code2;
datalines;
1 09222020 1111 2222 3333 4444
2 09222020 1111 2222 3333 .
2 08012000 1111 2222 4444 .
;
run;
Any thoughts of place to start with this would be helpful. Thank you
If it helps
data have;
input ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code Chg_Procedure_Code;
datalines;
1 09222020 1111 2222 3333
1 09222020 1111 2222 4444
2 09222020 1111 2222 3333
2 08222020 1111 2222 4444
;
run;
Proc sort data=have out=have;
By ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code;
run;
proc transpose data=have out=want (drop=_NAME_) prefix=Chg_Procedure_Code;
By ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code;
var Chg_Procedure_Code;
run;
The output
ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code Chg_Procedure_Code1 Chg_Procedure_Code2
1 9222020 1111 2222 3333 4444
2 8222020 1111 2222 4444 .
2 9222020 1111 2222 3333 .
My thoughts have been to use the following code but I am not capturing what I would like but hopefully the code will give an idea of what I am thinking will work. Thank want
data have;
set all;
array nine icd9_1-icd9_3;
array ten icd10_1-icd10_4;
array cpt cpt1-cpt7;
do i = 1 to 3;*3 is the max number of different codes;
nine{i} = lag(Chg_Pri_Diagnosis_Code);
end;
do i = 1 to 4;*4 is the max number of different codes;
ten{i} = lag(Chg_Pri_ICD10_Diagnosis_Code);
end;
do i = 1 to 7;*7 is the max number of different codes;
cpt{i} = lag(Chg_Procedure_Code);
end;
b
Are there any other variables in your have data set other than the ones shown? If so, do they appear in the result?
@GS2,
Here's one approach; see code below, followed by the results.
This code allows one to have any number of Chg_Procedure_Code columns in the wide version of the dataset. One simply sets the Nbr_of_Codes macro variable to whatever the maximum number of Chg_Procedure_Code columns you desire. In this example, I've modified the data to fit three Chg_Procedure_Code columns.
Notice also that I've introduced a duplicate for ID 2 on Sept. 22, 6666. In our results we can see that there is no duplicate 6666. The Multidata:'N' in conjunction with how the keys are defined removes duplicates. One could also remove the same duplicates in the Sort step.
This wasn't a specified requirement, but the Chg_Procedure_Code columns will have the values in sort order. I.e. for values 1111, 2222, and 3333, the first Chg_Procedure_Code column will have 1111, the second 2222, and the third 3333. This is accomplished by using Ordered:'A' (ascending). If you don't want this behavior, you can just remove this parameter.
%LET Nbr_of_Codes 3;
DATA Have;
FORMAT ID;
FORMAT Date YYMMDDD10.;
INFILE Datalines;
input
ID $
Date : ANYDTDTE8.
Chg_Pri_Diagnosis_Code $
Chg_Pri_ICD10_Diagnosis_Code $
Chg_Procedure_Code $
;
Datalines;
1 09222020 1111 2222 3333
1 09222020 1111 2222 4444
2 09222020 1111 2222 1111
2 09222020 1111 2222 6666
2 09222020 1111 2222 6666
2 08222020 1111 2222 7777
7 07222020 1111 2222 6666
7 07222020 1111 2222 7777
7 07222020 1111 2222 8888
;
RUN;
PROC SORT DATA=Have;
BY ID Date;
RUN;
DATA Want;
DROP _:;
DROP Chg_Procedure_Code;
SET Have;
BY ID Date;
ARRAY Chg_Procedure_Codes [*] $8 Chg_Procedure_Code1 - Chg_Procedure_Code&Nbr_of_Codes;
IF _N_ = 1 THEN
DO;
DECLARE HASH H_Pat_Data(ORDERED: 'A', MULTIDATA:'N');
H_Pat_Data.DEFINEKEY ('ID', 'Date', 'Chg_Procedure_Code');
H_Pat_Data.DEFINEDONE();
DECLARE HITER HI_Pat_Data('H_Pat_Data');
END;
_RC = H_Pat_Data.ADD();
IF LAST.Date THEN
DO;
_RC = HI_Pat_Data.FIRST();
DO _i = 1 TO &Nbr_of_Codes;
Chg_Procedure_Codes[_i] = Chg_Procedure_Code;
_RC = HI_Pat_Data.NEXT();
IF _RC > 0 THEN
DO;
_i = &Nbr_of_Codes;
_RC = H_Pat_Data.CLEAR();
END;
END;
END;
ELSE
DO;
DELETE;
END;
RUN;
Results:
Jim
There are other variables but this is just building a data set for review so they are brought over but not part of what is wanted. Thank you
Having data in wide format can cause difficulties when processing the data. The first difficulty is already shown in the solution provided by @jimbarbour: you need to know how many variables for the Chg_Procedure_Codes are required. Later on you will have to write checks talking into account that some of the Chg_Procedure_Code-variable are missing. So: are you sure, that the wide-format serves the whatever you are doing with the data afterwards better than the long-format?
Something else: What do you expect from the following data (don't know if the combination of values is possible, at all):
ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code Chg_Procedure_Code 1 09222020 1111 2222 3333 1 09222020 4242 2222 4444
From your description those obs should be combined, forming something like???
Yes, wide is how the data needs to be structured.
There may be combinations but this is more gathering how many potential codes are available.
If I understand you correctly, you want to do this with all the 3 variables.
The first requirement is to get the number of elements for each variable:
proc sql noprint;
select
max(n1),
max(n2),
max(n3)
into :n1 trimmed,:n2 trimmed,:n3 trimmed
from(
select
count(distinct Chg_Pri_Diagnosis_Code) as n1,
count(distinct Chg_Pri_ICD10_Diagnosis_Code) as n2,
count(distinct Chg_Procedure_Code) as n3
from have
group by ID,Date
)
;
quit;
Then, define a macro to get the data into an array, and use that in a DoW loop:
%macro make_array(var,n);
%if &n>1 %then %do;
array _&var(*) 8 &var.1-&var.&n;
if not whichn(&var,of _&var(*)) then
_&var(n(of _&var(*))+1)=&var;
drop &var;
%end;
%mend;
options mprint;
data want;
do until(last.Date);
set have;
by ID Date notsorted;
%make_array(Chg_Pri_Diagnosis_Code,&n1);
%make_array(Chg_Pri_ICD10_Diagnosis_Code,&n2);
%make_array(Chg_Procedure_Code,&n3);
end;
run;
You data was not sorted correctly by DATE, so I used the NOTSORTED option. You may want to replace that with a PROC SORT instead.
So I believe this method will work based on what I am seeing; However, I did make a mistake with the data. Chg_Pri_ICD10_Diagnosis_Code is actually a character variable, there is a letter in a couple observations. Sorry about that. Is it possible to modify the code to incorporate if 1 variable is a character? Thank you
@GS2,
The code I have written will accommodate character. I usually just code everything character unless I intend to do calculations or it's a date.
@andreas_lds made a good point that it might be a bit of a pain to have to manually count the number of unique procedure codes. That is easily solved by adding a NODUPKEY parameter to the sort and then inserting a little Data step like the below. The results are I believe what you want and there's no manual step of counting the number of unique procedure codes.
DATA Have;
FORMAT ID;
FORMAT Date YYMMDDD10.;
INFILE Datalines;
input
ID $
Date : ANYDTDTE8.
Chg_Pri_Diagnosis_Code $
Chg_Pri_ICD10_Diagnosis_Code $
Chg_Procedure_Code $
;
Datalines;
1 09222020 1111 2222 3333
1 09222020 1111 2222 4444
2 09222020 1111 2222 1111
2 09222020 1111 2222 6666
2 09222020 1111 2222 6666
2 08222020 1111 2222 7777
7 07222020 1111 2222 6666
7 07222020 1111 2222 7777
7 07222020 1111 2222 8888
;
RUN;
PROC SORT DATA=Have NODUPKEY;
BY ID Date Chg_Procedure_Code;
RUN;
DATA _NULL_;
RETAIN Max_Procedure_Code_Cnt 0;
IF _End_of_Data THEN
CALL SYMPUTX('Nbr_of_Codes', Max_Procedure_Code_Cnt, 'G');
SET Have END = _End_of_Data;
BY ID Date Chg_Procedure_Code;
Procedure_Code_Cnt + 1;
IF LAST.Chg_Procedure_Code THEN
IF Procedure_Code_Cnt > Max_Procedure_Code_Cnt THEN
DO;
Max_Procedure_Code_Cnt = Procedure_Code_Cnt;
Procedure_Code_Cnt = 0;
END;
RUN;
%PUT &Nte1 &=Nbr_of_Codes;
DATA Want;
DROP _:;
DROP Chg_Procedure_Code;
SET Have;
BY ID Date;
ARRAY Chg_Procedure_Codes [*] $8 Chg_Procedure_Code1 - Chg_Procedure_Code&Nbr_of_Codes;
IF _N_ = 1 THEN
DO;
DECLARE HASH H_Pat_Data(ORDERED: 'A', MULTIDATA:'N');
H_Pat_Data.DEFINEKEY ('ID', 'Date', 'Chg_Procedure_Code');
H_Pat_Data.DEFINEDONE();
DECLARE HITER HI_Pat_Data('H_Pat_Data');
END;
_RC = H_Pat_Data.ADD();
IF LAST.Date THEN
DO;
_RC = HI_Pat_Data.FIRST();
DO _i = 1 TO &Nbr_of_Codes;
Chg_Procedure_Codes[_i] = Chg_Procedure_Code;
_RC = HI_Pat_Data.NEXT();
IF _RC > 0 THEN
DO;
_i = &Nbr_of_Codes;
_RC = H_Pat_Data.CLEAR();
END;
END;
END;
ELSE
DO;
DELETE;
END;
RUN;
Jim
@GS2 wrote:
So I believe this method will work based on what I am seeing; However, I did make a mistake with the data. Chg_Pri_ICD10_Diagnosis_Code is actually a character variable, there is a letter in a couple observations. Sorry about that. Is it possible to modify the code to incorporate if 1 variable is a character? Thank you
That makes things a bit more complicated. I changed the macro to find the variable type and length of the variable, and define the array (and the WHICH function to call) using that:
%macro make_array(indata,var,n,onerror=abort cancel);
%if &n>1 %then %do;
%local dsid varnum vartype vardef whichfunc;
%let dsid=%sysfunc(open(&indata));
%if &dsid=0 %then %do;
%put %qsysfunc(sysmsg());
&onerror;
%return;
%end;
%let varnum=%sysfunc(varnum(&dsid,&var));
%if &varnum=0 %then %do;
%put Variable &var not found on &indata;
&onerror;
%return;
%end;
%let vartype=%sysfunc(vartype(&dsid,&varnum));
%if &vartype=N %then %do;
%let vardef=8;
%let whichfunc=whichn;
%end;
%else %do;
%let vardef=$%sysfunc(varlen(&dsid,&varnum));
%let whichfunc=whichc;
%end;
%let dsid=%sysfunc(close(&dsid));
array _&var(*) &vardef &var.1-&var.&n;
if not &whichfunc(&var,of _&var(*)) then
_&var(&n-cmiss(of _&var(*))+1)=&var;
drop &var;
%end;
%mend;
The macro now has two more parameters: First parameter is the input data set, the last parameter is what to do if there is an error (e.g. the input data does not exist, or the variable is not on the input data set). I set the default to ABORT CANCEL, which will exit the submit block, but not close SAS.
You will now have to call the macro with one more parameter, namely the name of the input:
options mprint;
data want;
do until(last.Date);
set have;
by ID Date notsorted;
%make_array(have,Chg_Pri_Diagnosis_Code,&n1);
%make_array(have,Chg_Pri_ICD10_Diagnosis_Code,&n2);
%make_array(have,Chg_Procedure_Code,&n3);
end;
run;
The SQL part is the same as my first answer.
If it helps
data have;
input ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code Chg_Procedure_Code;
datalines;
1 09222020 1111 2222 3333
1 09222020 1111 2222 4444
2 09222020 1111 2222 3333
2 08222020 1111 2222 4444
;
run;
Proc sort data=have out=have;
By ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code;
run;
proc transpose data=have out=want (drop=_NAME_) prefix=Chg_Procedure_Code;
By ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code;
var Chg_Procedure_Code;
run;
The output
ID Date Chg_Pri_Diagnosis_Code Chg_Pri_ICD10_Diagnosis_Code Chg_Procedure_Code1 Chg_Procedure_Code2
1 9222020 1111 2222 3333 4444
2 8222020 1111 2222 4444 .
2 9222020 1111 2222 3333 .
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.