I have a large claims databases with four diagnosis codes per claim. The data looks like this:
ID CLAIM_ID DATE DX1 DX2 DX3 DX4
1 100 1/1/2015 7804
1 101 1/1/2015 30921 39021
1 102 2/1/2015 30943 902 01920
1 103 3/1/2015 011
2 104 4/1/2015 4530
2 105 5/1/2015 V9090
3 106 6/1/2015 7039
3 107 6/1/2015 7039
3 108 6/1/2015 E884 0930 1092 0930
3 109 7/1/2015 3094
data have;
infile datalines dlm="," missover;
input id claim_id date $ dx1 $ dx2 $ dx3 $ dx4 $;
datalines;
1,100,1/1/2015,7804,,
1,101,1/1/2015,30921,39021,,
1,102,2/1/2015,30943,902,01920,
1,103,3/1/2015,011,,,
2,104,4/1/2015,4530,,,
2,105,5/1/2015,V9090,,,
3,106,6/1/2015,7039,,,
3,107,6/1/2015,7039,,,
3,108,6/1/2015,E884,0930,1092,0930
3,109,7/1/2015,3094,,,
;
run;
I want to create a dataset that has all of the *unique* diagnosis codes that occurred on claims during the same day for a given individual, as below (I don't care about whether it was DX1, DX2, DX3, or DX4). This would involve using proc transpose somehow to create a series of variables that I might call "DXS1-DXS5" (in this fake example, five variables would be created but it would in reality be way more). I've been playing with the syntax of proc transpose and can't get this work, though.
ID CLAIM_ID DATE DX1 DX2 DX3 DX4 DXS1 DXS2 DXS3 DXS4 DXS5
1 100 1/1/2015 7804 7804 30921 39021
1 101 1/1/2015 30921 39021 7804 30921 39021
1 102 2/1/2015 30943 902 01920 30943 902 01920
1 103 3/1/2015 011 011
2 104 4/1/2015 4530 4530
2 105 5/1/2015 V9090 V9090
3 106 6/1/2015 7039 7039 E884 0930 1092 0930
3 107 6/1/2015 7039 7039 E884 0930 1092 0930
3 108 6/1/2015 E884 0930 1092 0930 7039 E884 0930 1092 0930
3 109 7/1/2015 3094 3094
I'd appreciate suggestions on how to proceed. Thanks.
... View more