BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chuakp
Obsidian | Level 7

I have a large claims databases with four diagnosis codes per claim.  The data looks like this:  

ID     CLAIM_ID   DATE           DX1        DX2         DX3         DX4

1       100            1/1/2015       7804         

1       101            1/1/2015       30921     39021

1       102            2/1/2015       30943     902          01920

1       103            3/1/2015       011

2       104            4/1/2015       4530

2       105            5/1/2015       V9090

3       106            6/1/2015       7039            

3       107            6/1/2015       7039

3       108            6/1/2015       E884       0930        1092      0930 

3       109            7/1/2015       3094 

 

data have;

  infile datalines dlm="," missover;

  input id claim_id date $ dx1 $ dx2 $ dx3 $ dx4 $;

datalines;

1,100,1/1/2015,7804,,  

1,101,1/1/2015,30921,39021,,

1,102,2/1/2015,30943,902,01920,

1,103,3/1/2015,011,,,

2,104,4/1/2015,4530,,,

2,105,5/1/2015,V9090,,,

3,106,6/1/2015,7039,,,

3,107,6/1/2015,7039,,,

3,108,6/1/2015,E884,0930,1092,0930

3,109,7/1/2015,3094,,,

;

run;

 

I want to create a dataset that has all of the *unique* diagnosis codes that occurred on claims during the same day for a given individual, as below (I don't care about whether it was DX1, DX2, DX3, or DX4).  This would involve using proc transpose somehow to create a series of variables that I might call "DXS1-DXS5" (in this fake example, five variables would be created but it would in reality be way more).  I've been playing with the syntax of proc transpose and can't get this work, though.  

 

 

ID     CLAIM_ID   DATE          DX1     DX2    DX3    DX4   DXS1   DXS2   DXS3   DXS4   DXS5

1       100            1/1/2015       7804                                     7804    30921   39021

1       101            1/1/2015       30921  39021                       7804    30921   39021

1       102            2/1/2015       30943  902     01920            30943  902       01920             

1       103            3/1/2015       011                                       011

2       104            4/1/2015       4530                                     4530

2       105            5/1/2015       V9090                                   V9090

3       106            6/1/2015       7039                                     7039   E884     0930    1092    0930

3       107            6/1/2015       7039                                     7039   E884     0930    1092    0930

3       108            6/1/2015       E884    0930  1092   0930   7039   E884     0930    1092    0930 

3       109            7/1/2015       3094                                     3094

 

 

I'd appreciate suggestions on how to proceed.  Thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I would recommend that you first create a data set of all the DXS values, and then merge it back in.  Here is one way (assuming DX1-DX4 are character):

 

data dxs;

set have;

array dx {4};

do i=1 to 4;

   if dx{i} > ' ' then do;

      diagnosis = dx{i};

      output;

   end;

end;

keep ID diagnosis date;

run;

proc sort data=dxs nodupkey;

   by id date diagnosis;

run;

 

At this point, you can transpose the unique diagnosis values back into a single observation per ID, with as many DXS variables as needed:

 

proc transpose data=dxs  prefix='dxs' out=dxs_per_id (drop=_name_);

var diagnosis;

by id date;

run;

 

Then all that is left to do is to merge it back in.  Assuming your original data set is sorted:

 

data want;

merge have dxs_per_id;

by id date;

run;

 

The code is untested, but should be fine.  Write back if it causes any problems.

 

Good luck.

View solution in original post

6 REPLIES 6
Reeza
Super User
Your moving diagnosis from one record to another because its the same day? What's the point of keeping multiple records per day?
Reeza
Super User

You need to calculate the daily separately and then merge it back in. 

 

data have;
  infile datalines dlm="," truncover;
  input id claim_id date $ dx1 $ dx2 $ dx3 $ dx4 $;
datalines;
1,100,1/1/2015,7804,,  
1,101,1/1/2015,30921,39021,,
1,102,2/1/2015,30943,902,01920,
1,103,3/1/2015,011,,,
2,104,4/1/2015,4530,,,
2,105,5/1/2015,V9090,,,
3,106,6/1/2015,7039,,,
3,107,6/1/2015,7039,,,
3,108,6/1/2015,E884,0930,1092,0930
3,109,7/1/2015,3094,,,
;
run;

proc transpose data=have out=flipped(rename=col1=dx);
by id claim_id date;
var dx1-dx4;
run;

proc sort data=flipped nodupkey out=flipped_unique (where=(not missing(dx)) drop=claim_id);
by id date dx;
run;

data id;
set flipped_unique;
by id date;
if first.date then count=1; else count+1;
name=cats("dxs", count);
run;

proc transpose data=id out=wide_daily(drop=_name_);
by id date;
id name;
var dx;
run;

proc sql;
create table want as
select a.*, b.*
from have as a
left join wide_daily as b
on a.id=b.id and a.date=b.date;
quit;
Astounding
PROC Star

I would recommend that you first create a data set of all the DXS values, and then merge it back in.  Here is one way (assuming DX1-DX4 are character):

 

data dxs;

set have;

array dx {4};

do i=1 to 4;

   if dx{i} > ' ' then do;

      diagnosis = dx{i};

      output;

   end;

end;

keep ID diagnosis date;

run;

proc sort data=dxs nodupkey;

   by id date diagnosis;

run;

 

At this point, you can transpose the unique diagnosis values back into a single observation per ID, with as many DXS variables as needed:

 

proc transpose data=dxs  prefix='dxs' out=dxs_per_id (drop=_name_);

var diagnosis;

by id date;

run;

 

Then all that is left to do is to merge it back in.  Assuming your original data set is sorted:

 

data want;

merge have dxs_per_id;

by id date;

run;

 

The code is untested, but should be fine.  Write back if it causes any problems.

 

Good luck.

chuakp
Obsidian | Level 7

Thank you.  This works.  

 

One last question - how would I create a global macro variable that has the number of DXS diagnoses in the file?  

Astounding
PROC Star

There are a few ways ... here's one that meshes well with your program:

 

data _null_;

set dxs_per_id;

array dxs {*} dxs:;

call symputx('n_dxs_vars', dim(dxs));

stop;

run;

 

I think the best place to put this would be right after PROC TRANSPOSE creates DXS_PER_ID.  It would be important to call the variable that you are transposing DIAGNOSIS (or at least any name that does not begin with DXS).  The list DXS: is actually all variable names that begin with "DXS".

chuakp
Obsidian | Level 7

Thanks so much.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1718 views
  • 0 likes
  • 3 in conversation