BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RDzh
Calcite | Level 5

Hello everyone!

As a practice exercise I've been writing a macro that does this: substitutes the n-th occurrence of a substring within a string with another string. Thus, if I have the string KingRobertKingGeorgeKingAdamKing and I will use the macro to substitute, say, the 2nd occurrence of King with, say, Queen, and haveKingRobertQueenGeorgeKingAdamKing.


Now I've written a fairly successful macro to do this, except that it shows some behaviour it is not supposed to; here is the code (but don't read it line by line; just focus on where the problem arises, as explained in the red text and after the code). I provide the code as well as the data-set I use it with if you want to run it for yourself and see the issue first-hand. So...

%macro auxm(ds, col, str1, str2, n_occurr);

%let ds_id = %sysfunc(open(&ds));

%let colVarnum = %sysfunc(varnum(&ds_id, &col));

%let colVartype = %sysfunc(vartype(&ds_id, &colVarnum));

/* Check */ %put Check: vartype for the column is &colVartype;

%if &colVartype ne C %then %put The column is not character;

%else %do;

  %put The column is character;

  %let lstr1 = %length(&str1);

/* Check */ %put Length of the string to be changed is &lstr1;

  data &ds.2;

  set &ds;

  temp = &col;

  if &n_occurr = 1 then do;

  if index(temp, "&str1") = 1 then temp = "&str2"||substr(temp, &lstr1 + 1);

  else temp = substr(temp, 1, index(temp, "&str1") - 1) || "&str2" || substr(temp, index(temp, "&str1") + &lstr1);

  end;

  else do;    Problem arises here: the code in green below is not supposed to be hit when n_occurr  = 1, but it does get hit! Why?

  do i = 1 to (&n_occurr - 1);

  if index(temp, "&str1") = 1 then temp = substr(temp, &lstr1 + 1);

  else temp = substr(temp, 1, index(temp, "&str1") - 1) || substr(temp, index(temp, "&str1") + &lstr1);

  end;

  drop i;

  pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);

  col3 = &col;

  col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);

  drop age temp pos;

  end;

  run;

%end;

%mend auxm;

In more detail, my problem is this: since I work with two cases, based on whether n_occurr = 1 or n_occurr > 1 (i.e., whether I want to change the first or any subsequent occurrence of a given substring), when I have n_occurr = 1 for some reason my code also hits the else do condition.

I test this code with the dataset

data test;

input Name $25. age;

datalines;

RadoslavRadossRado   12

IvanRadowwRadollRados         5

SimonRadoseRadosseqRadok       31

;

run;

and with

%auxm(test, Name, Rado, XXX, 1), which works as supposed only when I put in comments the else-do part above;

or

%auxm(test, Name, Rado, XXX, 2), which works fine.

Any idea will be much appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Sounds like you are confused between using macro logic (%IF ...%THEN ... ) to selectively generate code and data step logic (IF ... THEN ...) to selectively execute statements.

Consider this example.

data want ;

set sashelp.class;

if 0=1 then do;

   drop NAME ;

end;

run;

Versus this example:

%macro example;

data want ;

set sashelp.class ;

%if 0=1 %then %do;

   drop name ;

%end;

run;

%mend example;

%example;

The first data step will drop the variable NAME because a DROP statement is evaluated when the data step is compiled and has no executable component than can be controlled by logic flow.

View solution in original post

10 REPLIES 10
ballardw
Super User

Please explicitly indicate which code is not supposed to run when n_occurr=1. Highlight all the lines.

RDzh
Calcite | Level 5

Thanks for your feedback and interest, ballardw!

See, issue is: I work with two cases, when n_occurr = 1 and when it is > 1 (that is, when I want to change the first occurrence of the substring and when I want to change some further occurrence). Consequently, I use if &n_occurr = 1 then do    and then  else do ; but somehow the statements in the else-do block also get hit when n_occurr = 1. When I put them in /*  */ the macro works fine for n_occurr = 1.

Thanks and kind regards

ballardw
Super User

Change this:

  drop i;

  pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);

  col3 = &col;

  col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);

  drop age temp pos;

  end;

To:

  end;

  drop i;

  pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);

  col3 = &col;

  col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);

  drop age temp pos;

The reassignments were only in the "else do" loop. Which was not executing when n_occur =1.

RDzh
Calcite | Level 5

No, it seems you got that wrong, ballardw, or I get your explanation wrong Smiley Happy

My problem is precisely that the code inside the else-do does execute when n_occurr = 1, and I don't want it to,  nor do I think it should. I'm interested also why it does; this seems to me like a bug or something about SAS I don't understand.

If you want, try my code in SAS to see what I mean.

Thanks and kind regards!

data_null__
Jade | Level 19

This example is adapted from the SAS documentation example for POSNEXT

%let o=2; /*Which occurrence to replace.*/
%let t=rado; /*target*/
%let r=-Replacement-; /*String to replace it with*/
data test;
   retain ExpressionId;
   if _n_ eq 1 then ExpressionID = prxparse("/&t/i");
   input text :$25. age;
   retain r "&r";
   start =
1;
  
stop  = length(text);
   i = 0;
  
call prxnext(ExpressionID, start, stop, text, position, length);
      do while (position > 0);
         i + 1;
         found = substr(text, position, length);
        
*put found= position= length=;
        
if &o eq i then leave;
         call prxnext(ExpressionID, start, stop, text, position, length);
      end;
  
length new $60;
  
if &o eq i then new = cats(substrn(text,1,position-1),r,substrn(text,position+length));
   put;
  
put text;
   put new;
   put;
  
datalines;
RadoslavRadossRado   12
IvanRadowwRadollRados         5
SimonRadoseRadosseqRadok       31
SimonRados                     33
;;;;
   run;
proc print;
  
run;

Capture.PNG
RDzh
Calcite | Level 5

Thanks for this, data _null_; it will be interesting to study this solution.

But I'm still keen to understand what goes wrong with my own.

Kind regards

Patrick
Opal | Level 21

I can't replicate what you describe. Your code works as you want it to. Run below cut-down version of your macro and you'll see.

%macro auxm(ds, col, str1, str2, n_occurr);

  %let ds_id = %sysfunc(open(&ds));

  %let colVarnum = %sysfunc(varnum(&ds_id, &col));

  %let colVartype = %sysfunc(vartype(&ds_id, &colVarnum));

  /* Check */

  %put Check: vartype for the column is &colVartype;

  %if &colVartype ne C %then

    %put The column is not character;

  %else

    %do;

      %put The column is character;

      %let lstr1 = %length(&str1);

      /* Check */

      %put Length of the string to be changed is &lstr1;

      data _NULL_;

        set &ds;

        temp = &col;

        if &n_occurr = 1 then

          do;

            put "XXXXXXX  part 1"; 

          end;

        else

          do;

            put "XXXXXXX  part 2"; 

          end;

      run;

    %end;

%mend auxm;

%auxm(sashelp.class, Name, Rado, XXX, 1)

You could of course check the value of "&n_occur" on macro level and then only generate the SAS code block you actually want to execute. This wouldn't change the result though.

Tom
Super User Tom
Super User

Sounds like you are confused between using macro logic (%IF ...%THEN ... ) to selectively generate code and data step logic (IF ... THEN ...) to selectively execute statements.

Consider this example.

data want ;

set sashelp.class;

if 0=1 then do;

   drop NAME ;

end;

run;

Versus this example:

%macro example;

data want ;

set sashelp.class ;

%if 0=1 %then %do;

   drop name ;

%end;

run;

%mend example;

%example;

The first data step will drop the variable NAME because a DROP statement is evaluated when the data step is compiled and has no executable component than can be controlled by logic flow.

RDzh
Calcite | Level 5

I started getting at something like this, because my code worked fine when I changed it this way:

  %if &n_occurr = 1 %then %do;

  data &ds.2;

  set &ds;

  temp = &col;

  if index(temp, "&str1") = 1 then temp = "&str2"||substr(temp, &lstr1 + 1);

  else temp = substr(temp, 1, index(temp, "&str1") - 1) || "&str2" || substr(temp, index(temp, "&str1") + &lstr1);

  run;

  %end;

  %else %do;

  data &ds.2;

  set &ds;

  temp = &col;

  do i = 1 to (&n_occurr - 1);

  if index(temp, "&str1") = 1 then temp = substr(temp, &lstr1 + 1);

  else temp = substr(temp, 1, index(temp, "&str1") - 1) || substr(temp, index(temp, "&str1") + &lstr1);

  end;

  drop i;

  pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);

  col3 = &col;

  col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);

  drop age temp pos;

  run;

  %end;

But thanks very much for revealing this aspect of SAS to me!

Also, thanks to the rest of you for your participation!

RDzh
Calcite | Level 5

What is a good place to read more about this distinction, between what counts as code generation and what can/cannot be controlled by logic flow in a data step?

Thanks and kind regards

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1183 views
  • 6 likes
  • 5 in conversation