BookmarkSubscribeRSS Feed
K_S
Obsidian | Level 7 K_S
Obsidian | Level 7

Hi all, 

I have a database that contains the names of antibiotics. Multiple people were entering data and for each antibiotic I have at least 5 different spellings. 

How can I make them consistent? I tried the following in a data step, but sadly SAS did not like it

 

if abx_name contains("31", "IPROFLOXACIN","CIPROFLOX","J01MA02") then Ciprofloxacin="Yes";

 

I hope you can follow the logic of this and can figure out what I was trying to do. 

Thanks, 

K_S

7 REPLIES 7
ballardw
Super User

While CONTAINS is in the syntax for SAS it is very limited as to where it may be used.

 

INDEX is likely what you want but you can only search for on substring at a time

 

if index(abx_name,"31") > 0 or index(abx_name,"IPROFLOXACIN") > 0 
or index(abx_name,"CIPROFLOX") > 0 or index(abx_name,"J01MA02")>0 
then Ciprofloxacin="Yes";

Contains may only be used in Proc SQL.

K_S
Obsidian | Level 7 K_S
Obsidian | Level 7

thanks! 

Reeza
Super User
You're likely better off creating a master list and comparing against that to clean up the data first. How many antibiotics and variations are you talking about and how accurate do you need the data to be?
K_S
Obsidian | Level 7 K_S
Obsidian | Level 7

Hi Reeza, 

There are 60 antibiotic names and they all have at least 5 spellings/formats.

I need the spellings to be consistent so that I can get accurate frequencies. There are about 5000 entries in my database.

Patrick
Opal | Level 21

@K_S 

If there is no clear pattern to the variations in spelling then there will likely be some "manual" labor for you.

What you could do is create a list of distinct spellings (i.e. via a SQL / select distinct abx_name). You then need to map these spellings to a standardized name so that you can create a SAS Format for recoding.

Many SAS Proc's allow you to use formatted values for analysis meaning you don't first have to create actual variables with new standardized values but you just instruct the Proc to apply the format to the original value (as done in below code in the Proc Freq statement).

data have;
  length abx_name $20;
  input abx_name :$20.;
  datalines;
31 
IPROFLOXACIN
CIPROFLOX
J01MA02
;
run;

proc format;
  value $antibiotics(default=20)
  '31'            ='Ciprofloxacin' 
  'IPROFLOXACIN'  ='Ciprofloxacin'
  'CIPROFLOX'     ='Ciprofloxacin'
  'J01MA02'       ='Ciprofloxacin'
  ;
run;

data want;
  set have;
  if put(abx_name,$antibiotics.)='Ciprofloxacin' then Ciprofloxacin="Yes";
run;

proc freq data=have;
  format abx_name $antibiotics.;
  table abx_name;
run;

NB: You can also create a format from data (proc format / cntlin) so you could simply copy/paste the list of distinct spellings into an Excel and then in a second column list your standardized spellings and then use this Excel to create a format.

Ksharp
Super User
if prxmatch( "/31|IPROFLOXACIN|CIPROFLOX|J01MA02/i",abx_name) then Ciprofloxacin="Yes";

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 993 views
  • 0 likes
  • 6 in conversation