renaming a variable that has different spelling error but is actually the same thing- HELP!

Reply
Occasional Contributor
Posts: 19

renaming a variable that has different spelling error but is actually the same thing- HELP!

Hi All

If you havea variable that is spelt in different ways but is the same thing, how can I rename it.

e.g 1 joon        to all read june, which is the correct way

         june

         juun

         jun3

         jun

Thanks J

Super Contributor
Posts: 349

Re: renaming a variable that has different spelling error but is actually the same thing- HELP!

Hi,

Try this...

data one;

input name $;

x=soundex(name);

if x='J5' then x='June';

cards;

joon 

june

juun

jun3

jun

;

run;

Thanks,

Shiva

Occasional Contributor
Posts: 19

Re: renaming a variable that has different spelling error but is actually the same thing- HELP!

Tx Shivas

Tried this but this is the error msg I recieve,
ERROR: No DATALINES or INFILE statement

Can you think of anything else?

I will be using the same table to insert the new name...let me know if I am explaining clearly, I am fairly new at this.

Tx J

Super Contributor
Posts: 349

Re: renaming a variable that has different spelling error but is actually the same thing- HELP!

Hi,

Can you paste the code that you have submitted in SAS editor.

Thanks,

Shiva

Respected Advisor
Posts: 3,896

Re: renaming a variable that has different spelling error but is actually the same thing- HELP!

I assuming that you're talking about values in a variable and not variable names.

What you need to do is some kind of standardization. If you need to get serious then using DataFlux might be the answer.

I don't think that soundex() would do the job as you need it.

Using Base SAS I possibly would first run a Proc Freq over my data. Assuming that typos are the exception the values with low frequencies are the candidates for miss-spelling.

I believe the next step would be to build a list of wrongly spelled values and what the correct spelling should be (manual process).

Once you've got such a list it should be easy to create a format out of it (Proc Format with cntlin=..).

And then just apply this format to standardize your data.

var=put(var,$standardize.);

Respected Advisor
Posts: 3,124

Re: renaming a variable that has different spelling error but is actually the same thing- HELP!

If using SQL, you could also try sound like =* operator:

data one;

input name $;

x=soundex(name);

if x='J5' then x='June';

cards;

joon 

june

juun

jun3

jun

;

run;

proc sql;

create table two as

select *, case when name=*'june'

  then 'june'

else 'others'

end as y

from one;

quit;

proc print;run;

Haikuo

Respected Advisor
Posts: 4,654

Re: renaming a variable that has different spelling error but is actually the same thing- HELP!

If the errors are due to mistyping then the function stedis gives you a distance based on the number and sseverity of mistakes in the misspelled word. You only need to choose a tolerance level. Try this:

data doubtful;
input word$;
datalines;
joon
june
juun
jun3
jun
mar
;
data good;
input keyWord$;
datalines;
march
june
july
;

proc sql;
select keyWord, word, spedis(word, keyWord) as distance
from good, doubtful;

PG

PG
Ask a Question
Discussion stats
  • 6 replies
  • 349 views
  • 0 likes
  • 5 in conversation