Solved: Re: Proc transpose

chuakp · Posted 12-23-2015 01:21 PM

I have a large claims databases with four diagnosis codes per claim. The data looks like this:

ID CLAIM_ID DATE DX1 DX2 DX3 DX4

1 100 1/1/2015 7804

1 101 1/1/2015 30921 39021

1 102 2/1/2015 30943 902 01920

1 103 3/1/2015 011

2 104 4/1/2015 4530

2 105 5/1/2015 V9090

3 106 6/1/2015 7039

3 107 6/1/2015 7039

3 108 6/1/2015 E884 0930 1092 0930

3 109 7/1/2015 3094

data have;

infile datalines dlm="," missover;

input id claim_id date $ dx1 $ dx2 $ dx3 $ dx4 $;

datalines;

1,100,1/1/2015,7804,,

1,101,1/1/2015,30921,39021,,

1,102,2/1/2015,30943,902,01920,

1,103,3/1/2015,011,,,

2,104,4/1/2015,4530,,,

2,105,5/1/2015,V9090,,,

3,106,6/1/2015,7039,,,

3,107,6/1/2015,7039,,,

3,108,6/1/2015,E884,0930,1092,0930

3,109,7/1/2015,3094,,,

;

run;

I want to create a dataset that has all of the *unique* diagnosis codes that occurred on claims during the same day for a given individual, as below (I don't care about whether it was DX1, DX2, DX3, or DX4). This would involve using proc transpose somehow to create a series of variables that I might call "DXS1-DXS5" (in this fake example, five variables would be created but it would in reality be way more). I've been playing with the syntax of proc transpose and can't get this work, though.

ID CLAIM_ID DATE DX1 DX2 DX3 DX4 DXS1 DXS2 DXS3 DXS4 DXS5

1 100 1/1/2015 7804 7804 30921 39021

1 101 1/1/2015 30921 39021 7804 30921 39021

1 102 2/1/2015 30943 902 01920 30943 902 01920

1 103 3/1/2015 011 011

2 104 4/1/2015 4530 4530

2 105 5/1/2015 V9090 V9090

3 106 6/1/2015 7039 7039 E884 0930 1092 0930

3 107 6/1/2015 7039 7039 E884 0930 1092 0930

3 108 6/1/2015 E884 0930 1092 0930 7039 E884 0930 1092 0930

3 109 7/1/2015 3094 3094

I'd appreciate suggestions on how to proceed. Thanks.

Astounding · Posted 12-23-2015 01:38 PM

I would recommend that you first create a data set of all the DXS values, and then merge it back in. Here is one way (assuming DX1-DX4 are character):

data dxs;

set have;

array dx {4};

do i=1 to 4;

if dx{i} > ' ' then do;

diagnosis = dx{i};

output;

end;

keep ID diagnosis date;

run;

proc sort data=dxs nodupkey;

by id date diagnosis;

run;

At this point, you can transpose the unique diagnosis values back into a single observation per ID, with as many DXS variables as needed:

proc transpose data=dxs prefix='dxs' out=dxs_per_id (drop=_name_);

var diagnosis;

by id date;

run;

Then all that is left to do is to merge it back in. Assuming your original data set is sorted:

data want;

merge have dxs_per_id;

by id date;

run;

The code is untested, but should be fine. Write back if it causes any problems.

Good luck.

View solution in original post

Reeza · Posted 12-23-2015 01:32 PM

Your moving diagnosis from one record to another because its the same day? What's the point of keeping multiple records per day?

Reeza · Posted 12-23-2015 01:38 PM

You need to calculate the daily separately and then merge it back in.

data have;
  infile datalines dlm="," truncover;
  input id claim_id date $ dx1 $ dx2 $ dx3 $ dx4 $;
datalines;
1,100,1/1/2015,7804,,  
1,101,1/1/2015,30921,39021,,
1,102,2/1/2015,30943,902,01920,
1,103,3/1/2015,011,,,
2,104,4/1/2015,4530,,,
2,105,5/1/2015,V9090,,,
3,106,6/1/2015,7039,,,
3,107,6/1/2015,7039,,,
3,108,6/1/2015,E884,0930,1092,0930
3,109,7/1/2015,3094,,,
;
run;

proc transpose data=have out=flipped(rename=col1=dx);
by id claim_id date;
var dx1-dx4;
run;

proc sort data=flipped nodupkey out=flipped_unique (where=(not missing(dx)) drop=claim_id);
by id date dx;
run;

data id;
set flipped_unique;
by id date;
if first.date then count=1; else count+1;
name=cats("dxs", count);
run;

proc transpose data=id out=wide_daily(drop=_name_);
by id date;
id name;
var dx;
run;

proc sql;
create table want as
select a.*, b.*
from have as a
left join wide_daily as b
on a.id=b.id and a.date=b.date;
quit;

Astounding · Posted 12-23-2015 01:38 PM

I would recommend that you first create a data set of all the DXS values, and then merge it back in. Here is one way (assuming DX1-DX4 are character):

data dxs;

set have;

array dx {4};

do i=1 to 4;

if dx{i} > ' ' then do;

diagnosis = dx{i};

output;

end;

keep ID diagnosis date;

run;

proc sort data=dxs nodupkey;

by id date diagnosis;

run;

At this point, you can transpose the unique diagnosis values back into a single observation per ID, with as many DXS variables as needed:

proc transpose data=dxs prefix='dxs' out=dxs_per_id (drop=_name_);

var diagnosis;

by id date;

run;

Then all that is left to do is to merge it back in. Assuming your original data set is sorted:

data want;

merge have dxs_per_id;

by id date;

run;

The code is untested, but should be fine. Write back if it causes any problems.

Good luck.

chuakp · Posted 12-23-2015 04:16 PM

Thank you. This works.

One last question - how would I create a global macro variable that has the number of DXS diagnoses in the file?

Astounding · Posted 12-23-2015 04:23 PM

There are a few ways ... here's one that meshes well with your program:

data _null_;

set dxs_per_id;

array dxs {*} dxs:;

call symputx('n_dxs_vars', dim(dxs));

stop;

run;

I think the best place to put this would be right after PROC TRANSPOSE creates DXS_PER_ID. It would be important to call the variable that you are transposing DIAGNOSIS (or at least any name that does not begin with DXS). The list DXS: is actually all variable names that begin with "DXS".

chuakp · Posted 12-26-2015 10:23 AM

Thanks so much.

Registration is open

SAS Training: Just a Click Away