BookmarkSubscribeRSS Feed
jessho
Calcite | Level 5

Hi all, 

Would appreciate any help you can provide with troubleshooting -- I am trying to collapse provider specialties (specialty_cat) into smaller/broader categories (spec categorical variable with three letters).  Specialty_cat is currently a character variable (see the ' ' below for character names).  I had try to do this using index function, but does not seem to be working.  Any sense of what I am doing wrong?

 

data WORK.all;
set WORK.all;

 

If index (specialty_cat, 'Derm')>0 then spec=3;
else if index (specialty_cat, 'Other MD specialty')>0 then spec=3;
else if index (specialty_cat, 'Emergency Medicine')>0 then spec=1;
else if index (specialty_cat, 'Family Medicine')>0 then spec=1;
else if index (specialty_cat, 'Internal Medicine')>0 then spec=1;
else if index (specialty_cat, 'NP or PA')>0 then spec=1;
else if index (specialty_cat, 'Pediatrics')>0 then spec=1;
else if index (specialty_cat, 'Psychology/therapy')>0 then spec=2;
else if index (specialty_cat, 'Psychiatrist')>0 then spec=3;
else if index (specialty_cat, 'Nutritionist')>0 then spec=.;
else if index (specialty_cat, 'Other')>0 then spec=.;
end;

2 REPLIES 2
Rick_SAS
SAS Super FREQ

Delete the END statement. I assume you meant it to be a RUN statement?

 

Also, a general tip: You might want to avoid using the same data set name in the DATA step because a mistake could destroy your data. 

 

 

 

ballardw
Super User

When a single variable is being used then a custom format or informat is often a likely solution.

 

I am confused by this statement:

 I am trying to collapse provider specialties (specialty_cat) into smaller/broader categories (spec categorical variable with three letters). 

as the code you show attempts to assign values to a numeric variable. If your data set already has a variable named Spec that is character you might have odd behaviors resulting from conversion from numeric to character values.

 

If you want to display a different 3 letter value for a variable then a format is likely a better bet. A possible example:

proc format library=work;
value $Medspecialty
'Other MD specialty'='ABC'
'Emergency Medicine'='CDF'
'Family Medicine'='CDF'
'Internal Medicine'='CDF'
'NP or PA'='CDF'
'Pediatrics'='CDF'
'Psychology/therapy'='GHI'
'Psychiatrist'='ABC'
'Nutritionist'=' '
'Other'=' '
;
run;


proc freq data=work.all;
   tables specialty_cat;
   format specialty_cat $medspecialty.;
run;

You can have multiple formats with different groupings as use the desired one as needed by applying the format in the current procedure.

 

Also if you have change in your data adding the new value into the format means that the remainder of code likely doesn't need additional changes.

There is also a special type of format called MULTILABEL that while it only works with a few procedures such as Means, Tabulate and Report will allow hierarchies.

 

Without seeing data or what you expect for output is hard to determine why your code wouldn't work. Possibly you are using different spellings, perhaps capitol letters or embedded spaces are actually in the values of specialty_cat than you think. Or perhaps a format has already been applied to the variable in which case the index function won't see the formatted value as your code is written.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1685 views
  • 0 likes
  • 3 in conversation