Hello everyone!
As a practice exercise I've been writing a macro that does this: substitutes the n-th occurrence of a substring within a string with another string. Thus, if I have the string KingRobertKingGeorgeKingAdamKing and I will use the macro to substitute, say, the 2nd occurrence of King with, say, Queen, and haveKingRobertQueenGeorgeKingAdamKing.
Now I've written a fairly successful macro to do this, except that it shows some behaviour it is not supposed to; here is the code (but don't read it line by line; just focus on where the problem arises, as explained in the red text and after the code). I provide the code as well as the data-set I use it with if you want to run it for yourself and see the issue first-hand. So...
%macro auxm(ds, col, str1, str2, n_occurr);
%let ds_id = %sysfunc(open(&ds));
%let colVarnum = %sysfunc(varnum(&ds_id, &col));
%let colVartype = %sysfunc(vartype(&ds_id, &colVarnum));
/* Check */ %put Check: vartype for the column is &colVartype;
%if &colVartype ne C %then %put The column is not character;
%else %do;
%put The column is character;
%let lstr1 = %length(&str1);
/* Check */ %put Length of the string to be changed is &lstr1;
data &ds.2;
set &ds;
temp = &col;
if &n_occurr = 1 then do;
if index(temp, "&str1") = 1 then temp = "&str2"||substr(temp, &lstr1 + 1);
else temp = substr(temp, 1, index(temp, "&str1") - 1) || "&str2" || substr(temp, index(temp, "&str1") + &lstr1);
end;
else do; Problem arises here: the code in green below is not supposed to be hit when n_occurr = 1, but it does get hit! Why?
do i = 1 to (&n_occurr - 1);
if index(temp, "&str1") = 1 then temp = substr(temp, &lstr1 + 1);
else temp = substr(temp, 1, index(temp, "&str1") - 1) || substr(temp, index(temp, "&str1") + &lstr1);
end;
drop i;
pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);
col3 = &col;
col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);
drop age temp pos;
end;
run;
%end;
%mend auxm;
In more detail, my problem is this: since I work with two cases, based on whether n_occurr = 1 or n_occurr > 1 (i.e., whether I want to change the first or any subsequent occurrence of a given substring), when I have n_occurr = 1 for some reason my code also hits the else do condition.
I test this code with the dataset
data test;
input Name $25. age;
datalines;
RadoslavRadossRado 12
IvanRadowwRadollRados 5
SimonRadoseRadosseqRadok 31
;
run;
and with
%auxm(test, Name, Rado, XXX, 1), which works as supposed only when I put in comments the else-do part above;
or
%auxm(test, Name, Rado, XXX, 2), which works fine.
Any idea will be much appreciated!
Sounds like you are confused between using macro logic (%IF ...%THEN ... ) to selectively generate code and data step logic (IF ... THEN ...) to selectively execute statements.
Consider this example.
data want ;
set sashelp.class;
if 0=1 then do;
drop NAME ;
end;
run;
Versus this example:
%macro example;
data want ;
set sashelp.class ;
%if 0=1 %then %do;
drop name ;
%end;
run;
%mend example;
%example;
The first data step will drop the variable NAME because a DROP statement is evaluated when the data step is compiled and has no executable component than can be controlled by logic flow.
Please explicitly indicate which code is not supposed to run when n_occurr=1. Highlight all the lines.
Thanks for your feedback and interest, ballardw!
See, issue is: I work with two cases, when n_occurr = 1 and when it is > 1 (that is, when I want to change the first occurrence of the substring and when I want to change some further occurrence). Consequently, I use if &n_occurr = 1 then do and then else do ; but somehow the statements in the else-do block also get hit when n_occurr = 1. When I put them in /* */ the macro works fine for n_occurr = 1.
Thanks and kind regards
Change this:
drop i;
pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);
col3 = &col;
col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);
drop age temp pos;
end;
To:
end;
drop i;
pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);
col3 = &col;
col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);
drop age temp pos;
The reassignments were only in the "else do" loop. Which was not executing when n_occur =1.
No, it seems you got that wrong, ballardw, or I get your explanation wrong
My problem is precisely that the code inside the else-do does execute when n_occurr = 1, and I don't want it to, nor do I think it should. I'm interested also why it does; this seems to me like a bug or something about SAS I don't understand.
If you want, try my code in SAS to see what I mean.
Thanks and kind regards!
This example is adapted from the SAS documentation example for POSNEXT
Thanks for this, data _null_; it will be interesting to study this solution.
But I'm still keen to understand what goes wrong with my own.
Kind regards
I can't replicate what you describe. Your code works as you want it to. Run below cut-down version of your macro and you'll see.
%macro auxm(ds, col, str1, str2, n_occurr);
%let ds_id = %sysfunc(open(&ds));
%let colVarnum = %sysfunc(varnum(&ds_id, &col));
%let colVartype = %sysfunc(vartype(&ds_id, &colVarnum));
/* Check */
%put Check: vartype for the column is &colVartype;
%if &colVartype ne C %then
%put The column is not character;
%else
%do;
%put The column is character;
%let lstr1 = %length(&str1);
/* Check */
%put Length of the string to be changed is &lstr1;
data _NULL_;
set &ds;
temp = &col;
if &n_occurr = 1 then
do;
put "XXXXXXX part 1";
end;
else
do;
put "XXXXXXX part 2";
end;
run;
%end;
%mend auxm;
%auxm(sashelp.class, Name, Rado, XXX, 1)
You could of course check the value of "&n_occur" on macro level and then only generate the SAS code block you actually want to execute. This wouldn't change the result though.
Sounds like you are confused between using macro logic (%IF ...%THEN ... ) to selectively generate code and data step logic (IF ... THEN ...) to selectively execute statements.
Consider this example.
data want ;
set sashelp.class;
if 0=1 then do;
drop NAME ;
end;
run;
Versus this example:
%macro example;
data want ;
set sashelp.class ;
%if 0=1 %then %do;
drop name ;
%end;
run;
%mend example;
%example;
The first data step will drop the variable NAME because a DROP statement is evaluated when the data step is compiled and has no executable component than can be controlled by logic flow.
I started getting at something like this, because my code worked fine when I changed it this way:
%if &n_occurr = 1 %then %do;
data &ds.2;
set &ds;
temp = &col;
if index(temp, "&str1") = 1 then temp = "&str2"||substr(temp, &lstr1 + 1);
else temp = substr(temp, 1, index(temp, "&str1") - 1) || "&str2" || substr(temp, index(temp, "&str1") + &lstr1);
run;
%end;
%else %do;
data &ds.2;
set &ds;
temp = &col;
do i = 1 to (&n_occurr - 1);
if index(temp, "&str1") = 1 then temp = substr(temp, &lstr1 + 1);
else temp = substr(temp, 1, index(temp, "&str1") - 1) || substr(temp, index(temp, "&str1") + &lstr1);
end;
drop i;
pos = index(temp, "&str1") + (&n_occurr - 1)*(&lstr1);
col3 = &col;
col3 = substr(col3, 1, pos - 1)||"&str2"||substr(col3, pos + &lstr1);
drop age temp pos;
run;
%end;
But thanks very much for revealing this aspect of SAS to me!
Also, thanks to the rest of you for your participation!
What is a good place to read more about this distinction, between what counts as code generation and what can/cannot be controlled by logic flow in a data step?
Thanks and kind regards
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.