- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm using SAS University Edition and I need to create dummy variables for my data, but the data I'm using is spread across multiple columns. Each row is a patient's hospital visit. I would like to create a dummy variable for each individual diagnosis code.
Here's an example of what my data looks like:
And here's what I want it to look like in the end:
Another complicating matter is that there are hundreds of Diagnosis columns so I would prefer a way that doesn't require typing them all out. Can anyone offer advice?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hundreds of names isn't a problem. But some other issues are. For example, what do these diagnosis codes look like? Are they valid names for variables in SAS? Perhaps they begin with a number, or perhaps they contain illegal characters such as a decimal point. Or perhaps they contain so many characters that they would exceed SAS's limit of 32 characters for a name. Will the spelling be consistent from one instance to the next?
If they start with a number, how would you like to handle that? Could the word "Diag_" be added as the first part of the name?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The diagnosis codes are valid names for variables-- they look like K8001, S722XA, etc. and they would not exceed the limit of 32 characters. They all start with a letter but are a combinations of letters and numbers after that. The spelling will be consistent as well. If it helps, they are ICD-10-CM diagnosis codes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's a good start.
To process the data, then, begin by getting lists of all possible diagnosis codes:
proc sql;
create table diag1 as select distinct diagnosis1 as diagnosis from have order by diagnosis;
create table diag2 as select distinct diagnosis2 as diagnosis from have order by diagnosis;
create table diag3 as select distinct diagnosis3 as diagnosis from have order by diagnosis;
quit;
data all_diags;
merge diag1 diag2 diag3;
by diagnosis;
if diagnosis > ' ';
run;
This gives you a unique list of all codes found in the data.
Next, create a macro variable holding those codes:
proc sql;
select diagnosis into : all_codes separated by ' ' from all_diags;
quit;
%put &all_codes;
You don't need the %PUT statement, but it might help illustrate what is happening at this point.
Finally, use that list in a DATA step and search for matches. Assuming 6 characters are needed for the longest code:
data want;
set have;
array diags {*} $ 6 &all_codes;
do _n_ = 1 to dim(diags);
diags{_n_} = 0;
if diagnosis1 = vname(diags{_n_}) then diags{_n_} = 1;
if diagnosis2 = vname(diags{_n_}) then diags{_n_} = 1;
if diagnosis3 = vname(diags{_n_}) then diags{_n_} = 1;
end;
run;
I'm assuming here that you really have just 3 diagnoses per hospital stay. You could always add a few more to the program if need be.
The code is untested. It looks right, so see if it works for what you need.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
All of this seems to work up until the last section of code. Then I get the following error:
Do you know what is wrong or how to get around this error?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input id (d1-d3) ($);
cards;
1 A B C
2 Y B D
3 Z B .
;
run;
data temp;
set have;
d=d1;v=1;output;
d=d2;v=1;output;
d=d3;v=1;output;
run;
proc transpose data=temp(where=(d is not missing)) out=temp1;
by id;
id d;
var v;
run;
proc stdize data=temp1 out=want reponly missing=0;
var _numeric_;
run;