DATA Step, Macro, Functions and more

Proc transpose

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 82
Accepted Solution

Proc transpose

I have a large claims databases with four diagnosis codes per claim.  The data looks like this:  

ID     CLAIM_ID   DATE           DX1        DX2         DX3         DX4

1       100            1/1/2015       7804         

1       101            1/1/2015       30921     39021

1       102            2/1/2015       30943     902          01920

1       103            3/1/2015       011

2       104            4/1/2015       4530

2       105            5/1/2015       V9090

3       106            6/1/2015       7039            

3       107            6/1/2015       7039

3       108            6/1/2015       E884       0930        1092      0930 

3       109            7/1/2015       3094 

 

data have;

  infile datalines dlm="," missover;

  input id claim_id date $ dx1 $ dx2 $ dx3 $ dx4 $;

datalines;

1,100,1/1/2015,7804,,  

1,101,1/1/2015,30921,39021,,

1,102,2/1/2015,30943,902,01920,

1,103,3/1/2015,011,,,

2,104,4/1/2015,4530,,,

2,105,5/1/2015,V9090,,,

3,106,6/1/2015,7039,,,

3,107,6/1/2015,7039,,,

3,108,6/1/2015,E884,0930,1092,0930

3,109,7/1/2015,3094,,,

;

run;

 

I want to create a dataset that has all of the *unique* diagnosis codes that occurred on claims during the same day for a given individual, as below (I don't care about whether it was DX1, DX2, DX3, or DX4).  This would involve using proc transpose somehow to create a series of variables that I might call "DXS1-DXS5" (in this fake example, five variables would be created but it would in reality be way more).  I've been playing with the syntax of proc transpose and can't get this work, though.  

 

 

ID     CLAIM_ID   DATE          DX1     DX2    DX3    DX4   DXS1   DXS2   DXS3   DXS4   DXS5

1       100            1/1/2015       7804                                     7804    30921   39021

1       101            1/1/2015       30921  39021                       7804    30921   39021

1       102            2/1/2015       30943  902     01920            30943  902       01920             

1       103            3/1/2015       011                                       011

2       104            4/1/2015       4530                                     4530

2       105            5/1/2015       V9090                                   V9090

3       106            6/1/2015       7039                                     7039   E884     0930    1092    0930

3       107            6/1/2015       7039                                     7039   E884     0930    1092    0930

3       108            6/1/2015       E884    0930  1092   0930   7039   E884     0930    1092    0930 

3       109            7/1/2015       3094                                     3094

 

 

I'd appreciate suggestions on how to proceed.  Thanks.

 


Accepted Solutions
Solution
‎12-23-2015 04:15 PM
Super User
Posts: 5,085

Re: Proc transpose

[ Edited ]

I would recommend that you first create a data set of all the DXS values, and then merge it back in.  Here is one way (assuming DX1-DX4 are character):

 

data dxs;

set have;

array dx {4};

do i=1 to 4;

   if dx{i} > ' ' then do;

      diagnosis = dx{i};

      output;

   end;

end;

keep ID diagnosis date;

run;

proc sort data=dxs nodupkey;

   by id date diagnosis;

run;

 

At this point, you can transpose the unique diagnosis values back into a single observation per ID, with as many DXS variables as needed:

 

proc transpose data=dxs  prefix='dxs' out=dxs_per_id (drop=_name_);

var diagnosis;

by id date;

run;

 

Then all that is left to do is to merge it back in.  Assuming your original data set is sorted:

 

data want;

merge have dxs_per_id;

by id date;

run;

 

The code is untested, but should be fine.  Write back if it causes any problems.

 

Good luck.

View solution in original post


All Replies
Super User
Posts: 17,868

Re: Proc transpose

Your moving diagnosis from one record to another because its the same day? What's the point of keeping multiple records per day?
Super User
Posts: 17,868

Re: Proc transpose

You need to calculate the daily separately and then merge it back in. 

 

data have;
  infile datalines dlm="," truncover;
  input id claim_id date $ dx1 $ dx2 $ dx3 $ dx4 $;
datalines;
1,100,1/1/2015,7804,,  
1,101,1/1/2015,30921,39021,,
1,102,2/1/2015,30943,902,01920,
1,103,3/1/2015,011,,,
2,104,4/1/2015,4530,,,
2,105,5/1/2015,V9090,,,
3,106,6/1/2015,7039,,,
3,107,6/1/2015,7039,,,
3,108,6/1/2015,E884,0930,1092,0930
3,109,7/1/2015,3094,,,
;
run;

proc transpose data=have out=flipped(rename=col1=dx);
by id claim_id date;
var dx1-dx4;
run;

proc sort data=flipped nodupkey out=flipped_unique (where=(not missing(dx)) drop=claim_id);
by id date dx;
run;

data id;
set flipped_unique;
by id date;
if first.date then count=1; else count+1;
name=cats("dxs", count);
run;

proc transpose data=id out=wide_daily(drop=_name_);
by id date;
id name;
var dx;
run;

proc sql;
create table want as
select a.*, b.*
from have as a
left join wide_daily as b
on a.id=b.id and a.date=b.date;
quit;
Solution
‎12-23-2015 04:15 PM
Super User
Posts: 5,085

Re: Proc transpose

[ Edited ]

I would recommend that you first create a data set of all the DXS values, and then merge it back in.  Here is one way (assuming DX1-DX4 are character):

 

data dxs;

set have;

array dx {4};

do i=1 to 4;

   if dx{i} > ' ' then do;

      diagnosis = dx{i};

      output;

   end;

end;

keep ID diagnosis date;

run;

proc sort data=dxs nodupkey;

   by id date diagnosis;

run;

 

At this point, you can transpose the unique diagnosis values back into a single observation per ID, with as many DXS variables as needed:

 

proc transpose data=dxs  prefix='dxs' out=dxs_per_id (drop=_name_);

var diagnosis;

by id date;

run;

 

Then all that is left to do is to merge it back in.  Assuming your original data set is sorted:

 

data want;

merge have dxs_per_id;

by id date;

run;

 

The code is untested, but should be fine.  Write back if it causes any problems.

 

Good luck.

Frequent Contributor
Posts: 82

Re: Proc transpose

Thank you.  This works.  

 

One last question - how would I create a global macro variable that has the number of DXS diagnoses in the file?  

Super User
Posts: 5,085

Re: Proc transpose

There are a few ways ... here's one that meshes well with your program:

 

data _null_;

set dxs_per_id;

array dxs {*} dxs:;

call symputx('n_dxs_vars', dim(dxs));

stop;

run;

 

I think the best place to put this would be right after PROC TRANSPOSE creates DXS_PER_ID.  It would be important to call the variable that you are transposing DIAGNOSIS (or at least any name that does not begin with DXS).  The list DXS: is actually all variable names that begin with "DXS".

Frequent Contributor
Posts: 82

Re: Proc transpose

Thanks so much.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 456 views
  • 0 likes
  • 3 in conversation