BookmarkSubscribeRSS Feed
0123
Calcite | Level 5

I'm using SAS University Edition and I need to create dummy variables for my data, but the data I'm using is spread across multiple columns.  Each row is a patient's hospital visit.  I would like to create a dummy variable for each individual diagnosis code.  

 

Here's an example of what my data looks like:

have.PNG

 

And here's what I want it to look like in the end:

want.PNG

 

Another complicating matter is that there are hundreds of Diagnosis columns so I would prefer a way that doesn't require typing them all out.  Can anyone offer advice?

5 REPLIES 5
Astounding
PROC Star

Hundreds of names isn't a problem.  But some other issues are.  For example, what do these diagnosis codes look like?  Are they valid names for variables in SAS?  Perhaps they begin with a number, or perhaps they contain illegal characters such as a decimal point.  Or perhaps they contain so many characters that they would exceed SAS's limit of 32 characters for a name.  Will the spelling be consistent from one instance to the next?

 

If they start with a number, how would you like to handle that?  Could the word "Diag_" be added as the first part of the name?

0123
Calcite | Level 5

The diagnosis codes are valid names for variables-- they look like K8001, S722XA, etc. and they would not exceed the limit of 32 characters.  They all start with a letter but are a combinations of letters and numbers after that.  The spelling will be consistent as well.  If it helps, they are ICD-10-CM diagnosis codes.

Astounding
PROC Star

That's a good start. 

 

To process the data, then, begin by getting lists of all possible diagnosis codes:

 

proc sql;

   create table diag1 as select distinct diagnosis1 as diagnosis from have order by diagnosis;

   create table diag2 as select distinct diagnosis2 as diagnosis from have order by diagnosis;

   create table diag3 as select distinct diagnosis3 as diagnosis from have order by diagnosis;

quit;

 

data all_diags;

merge diag1 diag2 diag3;

by diagnosis;

if diagnosis > ' ';

run;

 

This gives you a unique list of all codes found in the data.

 

Next, create a macro variable holding those codes:

 

proc sql;

select diagnosis into : all_codes separated by ' ' from all_diags;

quit;

 

%put &all_codes;

 

You don't need the %PUT statement, but it might help illustrate what is happening at this point.

 

Finally, use that list in a DATA step and search for matches.  Assuming 6 characters are needed for the longest code:

 

data want;

set have;

array diags {*} $ 6 &all_codes;

do _n_ = 1 to dim(diags);

   diags{_n_} = 0;

   if diagnosis1 = vname(diags{_n_}) then diags{_n_} = 1;

   if diagnosis2 = vname(diags{_n_}) then diags{_n_} = 1;

   if diagnosis3 = vname(diags{_n_}) then diags{_n_} = 1;

end;

run;

 

I'm assuming here that you really have just 3 diagnoses per hospital stay.  You could always add a few more to the program if need be.

 

The code is untested.  It looks right, so see if it works for what you need.

 

0123
Calcite | Level 5

All of this seems to work up until the last section of code.  Then I get the following error:

 

Screen Shot 2018-07-02 at 9.03.34 AM.png

 

Do you know what is wrong or how to get around this error?

Ksharp
Super User
data have;
input id (d1-d3) ($);
cards;
1 A B C
2 Y B D
3 Z B .
;
run;

data temp;
 set have;
 d=d1;v=1;output;
 d=d2;v=1;output;
 d=d3;v=1;output;
run;

proc transpose data=temp(where=(d is not missing)) out=temp1;
by id;
id d;
var v;
run;
proc stdize data=temp1 out=want reponly missing=0;
var _numeric_;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1512 views
  • 0 likes
  • 3 in conversation