BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ryanb2
Quartz | Level 8

I'm trying to replace state abbreviations with state names in a free-text variable that often lists multiple states.  TRANWRD works pretty well using arrays (see below) but in the example shown Alabama becomes AlabaMassachusetts.  I was trying to use a regular expression to isolate abbreviations but they don't seem to take SAS variable values in the replacements.  I also tried several ways to assign each abbreviation and state name to macro variables to use in the regular expression without success.  I'd like to know:

1.  What alternative method can I use to accomplish this task?

2.  Even if another method is easier, I'd like to know how to update macro variables in real-time or some other method that can be used to enter text real-time into a regular expression.

I tried communities and google but couldn't find what I was looking for.  Any input is appreciated.

Thanks.

data StateNames;

length states $300;

input states & $;

datalines;

CO(1), CT(1), FL(2), Alabama(1), MA(5), NJ(4), NY(2), PA(2), TX(1)

;

run;

data StateNames2;

  set StateNames;

  keep States_original States;

  States_original=States;

  array stateabr[59] $2 abr1-abr59 ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'

  'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'

  'GU' 'MH' 'MP' 'PW' 'PR' 'VI');

  array statenms[59] $50 nms1-nms59 ('Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado' 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii'

  'Idaho' 'Illinois' 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland' 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi'

  'Missouri' 'Montana' 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York' 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma'

  'Oregon' 'Pennsylvania' 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah' 'Vermont' 'Virginia' 'Washington' 'West Virginia'

  'Wisconsin' 'Wyoming' 'American Samoa' 'Washington DC' 'Federated States of Micronesia' 'Guam' 'Republic of the Marshall Islands'

  'Commonwealth of the Northern Mariana Islands' 'Republic of Palau' 'Puerto Rico' 'US Virgin Islands');

  do i=1 to 59;

  /* Method 1 - Alabama becomes AlabaMassachusetts */

/* States=TRANWRD(States,stateabr{i},strip(statenms{i}));*/

  /* Method 2 - No change to SAS variable values dont work in regular expressions? */

  abrev_re=prxparse("s/[^a-z]stateabr{i}[^a-z]/statenms{i}/i");

  call prxchange(abrev_re,-1,States);

  /* Method 3 - Use a real-time updating macro variable (How?) */

/* CALL SYMPUT('MACRO_stateabr', stateabr{i}) ;*/

/* CALL SYMPUT('MACRO_statenms', statenms{i}) ;*/

/* abrev_re=prxparse("s/(?<=[^a-z])&MACRO_stateabr.(?=[^a-z])/&MACRO_statenms./i");*/

/* call prxchange(abrev_re,-1,States);*/

  end;

run;

proc print data=StateNames2 noobs; run;

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

PRX does the job. Note the use of precompiled patterns and the word boundary metacharacter (\b).

data StateNames;

length states $300;

input states & $;

datalines;

CO(1), CT(1), FL(2), Alabama(1), MA(5), NJ(4), NY(2), PA(2), TX(1)

;

data StateNames2;

set StateNames;

keep States_original States;

States_original=States;

array stateabr[59] $2 abr1-abr59 ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'

'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'

'GU' 'MH' 'MP' 'PW' 'PR' 'VI');

array statenms[59] $50 nms1-nms59 ('Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado' 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii'

'Idaho' 'Illinois' 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland' 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi'

'Missouri' 'Montana' 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York' 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma'

'Oregon' 'Pennsylvania' 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah' 'Vermont' 'Virginia' 'Washington' 'West Virginia'

'Wisconsin' 'Wyoming' 'American Samoa' 'Washington DC' 'Federated States of Micronesia' 'Guam' 'Republic of the Marshall Islands'

'Commonwealth of the Northern Mariana Islands' 'Republic of Palau' 'Puerto Rico' 'US Virgin Islands');

array prxId{59} _temporary_;

if not prxId{1} then

    do i=1 to dim(prxId);

        prxId{i} = prxparse(cats("s/\b",stateAbr{i},"\b/",stateNms{i},"/i"));

        end;

do i = 1 to dim(prxId);

    call prxchange(prxId{i},-1,States);

    end;

run;

 

proc print data=StateNames2 noobs; run;

PG

PG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

PRX does the job. Note the use of precompiled patterns and the word boundary metacharacter (\b).

data StateNames;

length states $300;

input states & $;

datalines;

CO(1), CT(1), FL(2), Alabama(1), MA(5), NJ(4), NY(2), PA(2), TX(1)

;

data StateNames2;

set StateNames;

keep States_original States;

States_original=States;

array stateabr[59] $2 abr1-abr59 ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'

'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'

'GU' 'MH' 'MP' 'PW' 'PR' 'VI');

array statenms[59] $50 nms1-nms59 ('Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado' 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii'

'Idaho' 'Illinois' 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland' 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi'

'Missouri' 'Montana' 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York' 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma'

'Oregon' 'Pennsylvania' 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah' 'Vermont' 'Virginia' 'Washington' 'West Virginia'

'Wisconsin' 'Wyoming' 'American Samoa' 'Washington DC' 'Federated States of Micronesia' 'Guam' 'Republic of the Marshall Islands'

'Commonwealth of the Northern Mariana Islands' 'Republic of Palau' 'Puerto Rico' 'US Virgin Islands');

array prxId{59} _temporary_;

if not prxId{1} then

    do i=1 to dim(prxId);

        prxId{i} = prxparse(cats("s/\b",stateAbr{i},"\b/",stateNms{i},"/i"));

        end;

do i = 1 to dim(prxId);

    call prxchange(prxId{i},-1,States);

    end;

run;

 

proc print data=StateNames2 noobs; run;

PG

PG
Ryanb2
Quartz | Level 8

Very clever!  Thanks for your help PGStats.

ballardw
Super User

And an alternate approach that will only work for US.

data _null_;
   length searchstr $ 300;
   searchstr = 'The string has AL and MA state abbreviations in it MP';
   array stateabr[59] $2 _temporary_ ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'
  'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'
  'GU' 'MH' 'MP' 'PW' 'PR' 'VI');
  do i = 1 to dim(stateabr);
   pos = findw(searchstr,stateabr);
   if pos>0 then do;
      searchstr=catx(' ',substr(searchstr,1,(pos-1)),strip(stnamel(stateabr)),substr(searchstr,(pos+3)));
      put searchstr; /* this is to see each change as it happens for example*/
   end;
  end;
 

run;

PGStats
Opal | Level 21

One note. You should consider carefully your choice of replacing state codes written in lowercase. Many state codes are also two-letter English words. It might be better to restrict the replacement of common words (IN, ME, OK, OR, AS, etc) to uppercase.

PG

PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1145 views
  • 6 likes
  • 3 in conversation