I have a dataset like below:
ID | PRIM | DX_LINE | ICD10_TDL1 | icd10_TDL2 | icd10_TDL3 | icd10_TDL4 | icd10_TDL5 | icd10_TDL6 |
11111148 | 2 | 1 | Z23 | |||||
11111148 | 2 | 1 | Z23 | G20 | M54.16 | |||
11111130 | 2 | 1 | K22.70 | |||||
11111130 | 2 | 1 | K22.70 | Z86.010 | ||||
11111130 | 2 | 1 | K63.5 | K57.30 | ||||
11111175 | 1 | 1 | K21.9 | |||||
11111175 | 1 | 1 | R13.10 | R94.5 | K21.9 | Z12.11 | M85.80 | R03.0 |
11111175 | 1 | 1 | Z12.11 |
And my desired output should look like:
ID | PRIM | DX_LINE | ICD10_TDL1 | icd10_TDL2 | icd10_TDL3 | icd10_TDL4 | icd10_TDL5 | icd10_TDL6 |
11111148 | 2 | 1 | Z23 | G20 | M54.16 | |||
11111130 | K22.70 | Z86.010 | K63.5 | K57.30 | ||||
11111175 | R13.10 | R94.5 | K21.9 | Z12.11 | M85.80 | R03.0 |
I am new to SAS, please help.
The method to solve this efficiently is to transpose it to a long format, remove duplicates and then transpose back to a wide format.
Does the order of the ICD10 code matter? This solution assumes it doesn't, and given your data it shouldn't otherwise your second example would have issues.
Here's a worked example. Ignore the first step-it just generates sample data to test on, you can replace the dataset name with yours as well as the appropriate variable names.
*Generate sample data;
data have;
array dx(6) $ dx1-dx6;
array diag(6) $ _temporary_ ("A", "B", "C", "D", "E", "F");
do id=1 to 6;
id2=id*2;
do j=1 to 6;
if rand('bernoulli', 0.2)=1 then do;
dx(j)=diag(floor(rand('uniform')*6+1));
end;
end;
output;
end;
drop j;
run;
*Flip to long form;
proc transpose data=have out=flipped prefix=diag;
by id id2;
var dx1-dx6;
run;
*remove duplicates;
proc sort data=flipped nodupkey out=flipped_unique;
where not missing(diag1);
by id id2 diag1;
run;
*Flip to wide form;
proc transpose data=flipped_unique out=want prefix=dx;
by id id2;
var diag1;
run;
Why are PRIM and DX_Line kept for the first line but not for any of the others?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.