DATA Step, Macro, Functions and more

Real-time updating macro variable inside data step (regular expression)

Accepted Solution Solved
Reply
Contributor
Posts: 26
Accepted Solution

Real-time updating macro variable inside data step (regular expression)

I'm trying to replace state abbreviations with state names in a free-text variable that often lists multiple states.  TRANWRD works pretty well using arrays (see below) but in the example shown Alabama becomes AlabaMassachusetts.  I was trying to use a regular expression to isolate abbreviations but they don't seem to take SAS variable values in the replacements.  I also tried several ways to assign each abbreviation and state name to macro variables to use in the regular expression without success.  I'd like to know:

1.  What alternative method can I use to accomplish this task?

2.  Even if another method is easier, I'd like to know how to update macro variables in real-time or some other method that can be used to enter text real-time into a regular expression.

I tried communities and google but couldn't find what I was looking for.  Any input is appreciated.

Thanks.

data StateNames;

length states $300;

input states & $;

datalines;

CO(1), CT(1), FL(2), Alabama(1), MA(5), NJ(4), NY(2), PA(2), TX(1)

;

run;

data StateNames2;

  set StateNames;

  keep States_original States;

  States_original=States;

  array stateabr[59] $2 abr1-abr59 ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'

  'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'

  'GU' 'MH' 'MP' 'PW' 'PR' 'VI');

  array statenms[59] $50 nms1-nms59 ('Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado' 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii'

  'Idaho' 'Illinois' 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland' 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi'

  'Missouri' 'Montana' 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York' 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma'

  'Oregon' 'Pennsylvania' 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah' 'Vermont' 'Virginia' 'Washington' 'West Virginia'

  'Wisconsin' 'Wyoming' 'American Samoa' 'Washington DC' 'Federated States of Micronesia' 'Guam' 'Republic of the Marshall Islands'

  'Commonwealth of the Northern Mariana Islands' 'Republic of Palau' 'Puerto Rico' 'US Virgin Islands');

  do i=1 to 59;

  /* Method 1 - Alabama becomes AlabaMassachusetts */

/* States=TRANWRD(States,stateabr{i},strip(statenms{i}));*/

  /* Method 2 - No change to SAS variable values dont work in regular expressions? */

  abrev_re=prxparse("s/[^a-z]stateabr{i}[^a-z]/statenms{i}/i");

  call prxchange(abrev_re,-1,States);

  /* Method 3 - Use a real-time updating macro variable (How?) */

/* CALL SYMPUT('MACRO_stateabr', stateabr{i}) ;*/

/* CALL SYMPUT('MACRO_statenms', statenms{i}) ;*/

/* abrev_re=prxparse("s/(?<=[^a-z])&MACRO_stateabr.(?=[^a-z])/&MACRO_statenms./i");*/

/* call prxchange(abrev_re,-1,States);*/

  end;

run;

proc print data=StateNames2 noobs; run;


Accepted Solutions
Solution
‎05-15-2015 06:13 PM
Respected Advisor
Posts: 4,655

Re: Real-time updating macro variable inside data step (regular expression)

PRX does the job. Note the use of precompiled patterns and the word boundary metacharacter (\b).

data StateNames;

length states $300;

input states & $;

datalines;

CO(1), CT(1), FL(2), Alabama(1), MA(5), NJ(4), NY(2), PA(2), TX(1)

;

data StateNames2;

set StateNames;

keep States_original States;

States_original=States;

array stateabr[59] $2 abr1-abr59 ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'

'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'

'GU' 'MH' 'MP' 'PW' 'PR' 'VI');

array statenms[59] $50 nms1-nms59 ('Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado' 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii'

'Idaho' 'Illinois' 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland' 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi'

'Missouri' 'Montana' 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York' 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma'

'Oregon' 'Pennsylvania' 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah' 'Vermont' 'Virginia' 'Washington' 'West Virginia'

'Wisconsin' 'Wyoming' 'American Samoa' 'Washington DC' 'Federated States of Micronesia' 'Guam' 'Republic of the Marshall Islands'

'Commonwealth of the Northern Mariana Islands' 'Republic of Palau' 'Puerto Rico' 'US Virgin Islands');

array prxId{59} _temporary_;

if not prxId{1} then

    do i=1 to dim(prxId);

        prxId{i} = prxparse(cats("s/\b",stateAbr{i},"\b/",stateNms{i},"/i"));

        end;

do i = 1 to dim(prxId);

    call prxchange(prxId{i},-1,States);

    end;

run;

 

proc print data=StateNames2 noobs; run;

PG

PG

View solution in original post


All Replies
Solution
‎05-15-2015 06:13 PM
Respected Advisor
Posts: 4,655

Re: Real-time updating macro variable inside data step (regular expression)

PRX does the job. Note the use of precompiled patterns and the word boundary metacharacter (\b).

data StateNames;

length states $300;

input states & $;

datalines;

CO(1), CT(1), FL(2), Alabama(1), MA(5), NJ(4), NY(2), PA(2), TX(1)

;

data StateNames2;

set StateNames;

keep States_original States;

States_original=States;

array stateabr[59] $2 abr1-abr59 ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'

'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'

'GU' 'MH' 'MP' 'PW' 'PR' 'VI');

array statenms[59] $50 nms1-nms59 ('Alabama' 'Alaska' 'Arizona' 'Arkansas' 'California' 'Colorado' 'Connecticut' 'Delaware' 'Florida' 'Georgia' 'Hawaii'

'Idaho' 'Illinois' 'Indiana' 'Iowa' 'Kansas' 'Kentucky' 'Louisiana' 'Maine' 'Maryland' 'Massachusetts' 'Michigan' 'Minnesota' 'Mississippi'

'Missouri' 'Montana' 'Nebraska' 'Nevada' 'New Hampshire' 'New Jersey' 'New Mexico' 'New York' 'North Carolina' 'North Dakota' 'Ohio' 'Oklahoma'

'Oregon' 'Pennsylvania' 'Rhode Island' 'South Carolina' 'South Dakota' 'Tennessee' 'Texas' 'Utah' 'Vermont' 'Virginia' 'Washington' 'West Virginia'

'Wisconsin' 'Wyoming' 'American Samoa' 'Washington DC' 'Federated States of Micronesia' 'Guam' 'Republic of the Marshall Islands'

'Commonwealth of the Northern Mariana Islands' 'Republic of Palau' 'Puerto Rico' 'US Virgin Islands');

array prxId{59} _temporary_;

if not prxId{1} then

    do i=1 to dim(prxId);

        prxId{i} = prxparse(cats("s/\b",stateAbr{i},"\b/",stateNms{i},"/i"));

        end;

do i = 1 to dim(prxId);

    call prxchange(prxId{i},-1,States);

    end;

run;

 

proc print data=StateNames2 noobs; run;

PG

PG
Contributor
Posts: 26

Re: Real-time updating macro variable inside data step (regular expression)

Very clever!  Thanks for your help PGStats.

Super User
Posts: 10,537

Re: Real-time updating macro variable inside data step (regular expression)

And an alternate approach that will only work for US.

data _null_;
   length searchstr $ 300;
   searchstr = 'The string has AL and MA state abbreviations in it MP';
   array stateabr[59] $2 _temporary_ ('AL' 'AK' 'AZ' 'AR' 'CA' 'CO' 'CT' 'DE' 'FL' 'GA' 'HI' 'ID' 'IL' 'IN' 'IA' 'KS' 'KY' 'LA' 'ME' 'MD' 'MA' 'MI' 'MN' 'MS'
  'MO' 'MT' 'NE' 'NV' 'NH' 'NJ' 'NM' 'NY' 'NC' 'ND' 'OH' 'OK' 'OR' 'PA' 'RI' 'SC' 'SD' 'TN' 'TX' 'UT' 'VT' 'VA' 'WA' 'WV' 'WI' 'WY' 'AS' 'DC' 'FM'
  'GU' 'MH' 'MP' 'PW' 'PR' 'VI');
  do i = 1 to dim(stateabr);
   pos = findw(searchstr,stateabr);
   if pos>0 then do;
      searchstr=catx(' ',substr(searchstr,1,(pos-1)),strip(stnamel(stateabr)),substr(searchstr,(pos+3)));
      put searchstr; /* this is to see each change as it happens for example*/
   end;
  end;
 

run;

Respected Advisor
Posts: 4,655

Re: Real-time updating macro variable inside data step (regular expression)

One note. You should consider carefully your choice of replacing state codes written in lowercase. Many state codes are also two-letter English words. It might be better to restrict the replacement of common words (IN, ME, OK, OR, AS, etc) to uppercase.

PG

PG
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 325 views
  • 6 likes
  • 3 in conversation