Hi SAS people,
I'm wondering why when I use the SCAN function to extract a substring into a new variable which I then use as the second argument (target) of the TRANSTRN function in a subsequent line of code, the resulting variable created from the TRANSTRN function does not remove/replace the target substring whereas the exact same substring just manually created as a new variable (i.e., without using the SCAN function) does result in the target substring having been removed when it is specified in that second argument of the TRANSTRN function. Unless I specify the exact length of the targeted substring beforehand...
I.e., given this data set
data have;
infile datalines truncover;
input str $10.;
datalines;
U-FR/IT-AB
U-FRIT-AB
U-FREN-AB
U-CLST-AB
;
what I would like to see in a resulting data set is the "-AB" at the end of each value for str removed, resulting in "U-FR/IT", "U-FRIT", "U-FREN", and "U-CLST", respectively.
Here are three versions of the code I am using to accomplish this, each with the resulting table. Notice I also create another variable (sub2) with the target substring ("-AB") manually, which I then show is in fact the same exact value by comparing the two in a variable called "diff":
data want1 (drop=x); set have;
x=compbl(scan(str,3,"-"));
sub1=compbl(cats("-",compbl(x)));
sub2="-AB";
str1=transtrn(str,sub1,trimn(""));
str2=transtrn(str,sub2,trimn(""));
if sub1 ne sub2 then diff=1;
run;
data want2 (drop=x); set have;
length x $4; format x $4.;
length sub1 $4; format sub1 $4.;
length sub2 $4; format sub2 $4.;
x=compbl(scan(str,3,"-"));
sub1=compbl(cats("-",compbl(x)));
sub2="-AB";
str1=transtrn(str,sub1,trimn(""));
str2=transtrn(str,sub2,trimn(""));
if sub1 ne sub2 then diff=1;
run;
data want3 (drop=x); set have;
length x $3; format x $3.;
length sub1 $3; format sub1 $3.;
length sub2 $3; format sub2 $3.;
x=compbl(scan(str,3,"-"));
sub1=compbl(cats("-",compbl(x)));
sub2="-AB";
str1=transtrn(str,sub1,trimn(""));
str2=transtrn(str,sub2,trimn(""));
if sub1 ne sub2 then diff=1;
run;
So my question is: do I seriously need to specify lengths (and maybe formats) of the variable resulting from the SCAN function if I want to use it properly in a subsequent TRANSTRN function? What if the lengths are variable? Do I need to extract the length of each result and then create new variables for each length? Also, why the heck is the length of $4 in the WANT2 dataset working for the three values of str without a forward slash, but not for the one with the forward slash in row 1?
Any insight into this strange issue (strange to me) would be much appreciated!
... View more