Hi all,
I have a database that contains the names of antibiotics. Multiple people were entering data and for each antibiotic I have at least 5 different spellings.
How can I make them consistent? I tried the following in a data step, but sadly SAS did not like it
if abx_name contains("31", "IPROFLOXACIN","CIPROFLOX","J01MA02") then Ciprofloxacin="Yes";
I hope you can follow the logic of this and can figure out what I was trying to do.
Thanks,
K_S
While CONTAINS is in the syntax for SAS it is very limited as to where it may be used.
INDEX is likely what you want but you can only search for on substring at a time
if index(abx_name,"31") > 0 or index(abx_name,"IPROFLOXACIN") > 0 or index(abx_name,"CIPROFLOX") > 0 or index(abx_name,"J01MA02")>0 then Ciprofloxacin="Yes";
Contains may only be used in Proc SQL.
thanks!
Hi Reeza,
There are 60 antibiotic names and they all have at least 5 spellings/formats.
I need the spellings to be consistent so that I can get accurate frequencies. There are about 5000 entries in my database.
If there is no clear pattern to the variations in spelling then there will likely be some "manual" labor for you.
What you could do is create a list of distinct spellings (i.e. via a SQL / select distinct abx_name). You then need to map these spellings to a standardized name so that you can create a SAS Format for recoding.
Many SAS Proc's allow you to use formatted values for analysis meaning you don't first have to create actual variables with new standardized values but you just instruct the Proc to apply the format to the original value (as done in below code in the Proc Freq statement).
data have;
length abx_name $20;
input abx_name :$20.;
datalines;
31
IPROFLOXACIN
CIPROFLOX
J01MA02
;
run;
proc format;
value $antibiotics(default=20)
'31' ='Ciprofloxacin'
'IPROFLOXACIN' ='Ciprofloxacin'
'CIPROFLOX' ='Ciprofloxacin'
'J01MA02' ='Ciprofloxacin'
;
run;
data want;
set have;
if put(abx_name,$antibiotics.)='Ciprofloxacin' then Ciprofloxacin="Yes";
run;
proc freq data=have;
format abx_name $antibiotics.;
table abx_name;
run;
NB: You can also create a format from data (proc format / cntlin) so you could simply copy/paste the list of distinct spellings into an Excel and then in a second column list your standardized spellings and then use this Excel to create a format.
if prxmatch( "/31|IPROFLOXACIN|CIPROFLOX|J01MA02/i",abx_name) then Ciprofloxacin="Yes";
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.