- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Would appreciate any help you can provide with troubleshooting -- I am trying to collapse provider specialties (specialty_cat) into smaller/broader categories (spec categorical variable with three letters). Specialty_cat is currently a character variable (see the ' ' below for character names). I had try to do this using index function, but does not seem to be working. Any sense of what I am doing wrong?
data WORK.all;
set WORK.all;
If index (specialty_cat, 'Derm')>0 then spec=3;
else if index (specialty_cat, 'Other MD specialty')>0 then spec=3;
else if index (specialty_cat, 'Emergency Medicine')>0 then spec=1;
else if index (specialty_cat, 'Family Medicine')>0 then spec=1;
else if index (specialty_cat, 'Internal Medicine')>0 then spec=1;
else if index (specialty_cat, 'NP or PA')>0 then spec=1;
else if index (specialty_cat, 'Pediatrics')>0 then spec=1;
else if index (specialty_cat, 'Psychology/therapy')>0 then spec=2;
else if index (specialty_cat, 'Psychiatrist')>0 then spec=3;
else if index (specialty_cat, 'Nutritionist')>0 then spec=.;
else if index (specialty_cat, 'Other')>0 then spec=.;
end;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Delete the END statement. I assume you meant it to be a RUN statement?
Also, a general tip: You might want to avoid using the same data set name in the DATA step because a mistake could destroy your data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When a single variable is being used then a custom format or informat is often a likely solution.
I am confused by this statement:
I am trying to collapse provider specialties (specialty_cat) into smaller/broader categories (spec categorical variable with three letters).
as the code you show attempts to assign values to a numeric variable. If your data set already has a variable named Spec that is character you might have odd behaviors resulting from conversion from numeric to character values.
If you want to display a different 3 letter value for a variable then a format is likely a better bet. A possible example:
proc format library=work; value $Medspecialty 'Other MD specialty'='ABC' 'Emergency Medicine'='CDF' 'Family Medicine'='CDF' 'Internal Medicine'='CDF' 'NP or PA'='CDF' 'Pediatrics'='CDF' 'Psychology/therapy'='GHI' 'Psychiatrist'='ABC' 'Nutritionist'=' ' 'Other'=' ' ; run; proc freq data=work.all; tables specialty_cat; format specialty_cat $medspecialty.; run;
You can have multiple formats with different groupings as use the desired one as needed by applying the format in the current procedure.
Also if you have change in your data adding the new value into the format means that the remainder of code likely doesn't need additional changes.
There is also a special type of format called MULTILABEL that while it only works with a few procedures such as Means, Tabulate and Report will allow hierarchies.
Without seeing data or what you expect for output is hard to determine why your code wouldn't work. Possibly you are using different spellings, perhaps capitol letters or embedded spaces are actually in the values of specialty_cat than you think. Or perhaps a format has already been applied to the variable in which case the index function won't see the formatted value as your code is written.