I have a dataset like below:
ID | PRIM | DX_LINE | ICD10_TDL1 | icd10_TDL2 | icd10_TDL3 | icd10_TDL4 | icd10_TDL5 | icd10_TDL6 |
11111148 | 2 | 1 | Z23 | |||||
11111148 | 2 | 1 | Z23 | G20 | M54.16 | |||
11111130 | 2 | 1 | K22.70 | |||||
11111130 | 2 | 1 | K22.70 | Z86.010 | ||||
11111130 | 2 | 1 | K63.5 | K57.30 | ||||
11111175 | 1 | 1 | K21.9 | |||||
11111175 | 1 | 1 | R13.10 | R94.5 | K21.9 | Z12.11 | M85.80 | R03.0 |
11111175 | 1 | 1 | Z12.11 |
And my desired output should look like:
ID | PRIM | DX_LINE | ICD10_TDL1 | icd10_TDL2 | icd10_TDL3 | icd10_TDL4 | icd10_TDL5 | icd10_TDL6 |
11111148 | 2 | 1 | Z23 | G20 | M54.16 | |||
11111130 | K22.70 | Z86.010 | K63.5 | K57.30 | ||||
11111175 | R13.10 | R94.5 | K21.9 | Z12.11 | M85.80 | R03.0 |
I am new to SAS, please help.
The method to solve this efficiently is to transpose it to a long format, remove duplicates and then transpose back to a wide format.
Does the order of the ICD10 code matter? This solution assumes it doesn't, and given your data it shouldn't otherwise your second example would have issues.
Here's a worked example. Ignore the first step-it just generates sample data to test on, you can replace the dataset name with yours as well as the appropriate variable names.
*Generate sample data;
data have;
array dx(6) $ dx1-dx6;
array diag(6) $ _temporary_ ("A", "B", "C", "D", "E", "F");
do id=1 to 6;
id2=id*2;
do j=1 to 6;
if rand('bernoulli', 0.2)=1 then do;
dx(j)=diag(floor(rand('uniform')*6+1));
end;
end;
output;
end;
drop j;
run;
*Flip to long form;
proc transpose data=have out=flipped prefix=diag;
by id id2;
var dx1-dx6;
run;
*remove duplicates;
proc sort data=flipped nodupkey out=flipped_unique;
where not missing(diag1);
by id id2 diag1;
run;
*Flip to wide form;
proc transpose data=flipped_unique out=want prefix=dx;
by id id2;
var diag1;
run;
Why are PRIM and DX_Line kept for the first line but not for any of the others?
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.