Good afternoon everyone. I have two questions about the substring function. I'm looking at a Base SAS certification study guide and there are two questions I got wrong. Additionally, this particular study guide does not explain the answers. The first question pertains to the following code:
data test1;
first="Ipswich, England";
City=substr(first,1,7);
City_Country=City!!", "!!"England";
run;
Why wouldn't the value for City_Country be "Ipswich, England."? I thought the substr was taking out 7 characters from the first observation which would be Ipswich. However, the value ends up being "Ipswich , England"; notice the space after "Ipswich."
My second question pertains to the following code:
data test1;
first="Ipswich, England";
City_Country=substr(first,1,7)!!", "!!"England";
run;
Why wouldn't the length of City_Country be 16? In the previous example, the result of City_Country is "Ipswich , England" which is 16 characters. Shouldn't the value of City_Country be the same in this example. The length ends up being 25 in this example.
If you look carefully at the results of your first test, there is more than one space after Ipswich..
Since you didn't specify a length for City, it takes the length from First (i.e., 16). Then, when you create City_Country, it takes the length of City (16)+2 (the comma and space)+7 (the length of 'England")=25.
City_Country gets a value of Ipswich + the 9 spaces that exist to the right of the value Ipswich + the comma and space + the value 'England'
In your second example, it use same logic to determine the length, however it only concatenates the first seven characters from first, adds the comma and space, followed by 'England'
Art, CEO, AnalystFinder.com
If you look carefully at the results of your first test, there is more than one space after Ipswich..
Since you didn't specify a length for City, it takes the length from First (i.e., 16). Then, when you create City_Country, it takes the length of City (16)+2 (the comma and space)+7 (the length of 'England")=25.
City_Country gets a value of Ipswich + the 9 spaces that exist to the right of the value Ipswich + the comma and space + the value 'England'
In your second example, it use same logic to determine the length, however it only concatenates the first seven characters from first, adds the comma and space, followed by 'England'
Art, CEO, AnalystFinder.com
Once again you completely answered my question. Thank you Art!
Art, after playing around with concatenation more, I noticed some strange results. For the following code, the value of concat1 has a lot of spaces between first and second. Upon further inspection, I found that the length of first is 4 and the length of second is 12. Thus, second is the culprit. I added the line "length second 4." before "second=1001" and still the length ends up being 12. The only way I found to fix this is using the "strip" function around second within the concat1 assignment statement. Why does it set the defualt of this numeric as 12? I thought numerics have a default of 8 bytes. Additionally, why can't I change the length with a length statement?
data concat2;
first="1568";
X=length(first);
second=1001;
Y=length(second);
concat1=first||second;
Z=length(concat1);
run;
Not strange. In your latest example you're concatenating a string with a length of 4 with a number. When SAS converts a number to a character string it uses 12 bytes to represent the number. If the number has more than 12 characters, it is converted to an E form to fit into the 12 characters.
If you concatenated second with second it would result in the variable having a length of 24.
Art, CEO, AnalystFinder.com
This makes sense. Thank you!
The variable SECOND in your program is a number. So it's default length would by 8, but since it just contains an integer cutting the length in half to 4 would not change how the number is stored.
The reason that it ends up taking 12 characters in your generated string is because SAS will use the BEST12. format to convert the number SECOND into a string. It needs to do this since you used in a place where a character value is expected in the LENGHT() function and with the concatenate operator.
If you wanted to convert it to a four character string then you could use the PUT() function.
concat2=first||put(second,4.);
Z2=length(concat2));
The default length of a variable depends on its type and SAS platform;
As you did not define length for CITY, it is probably greater then 7, therfore is the space between preceeding the comma.
In order to get the format 'city, country' use the strip function or catx function:
data test1;
first="Ipswich, England";
City=substr(first,1,7);
City_Country=strip(City) !! ", " !! "England";
run;
/***** OR ****/
data test1;
first="Ipswich, England";
City=substr(first,1,7);
City_Country=catx(', ',City, "England");
run;
Thank you for showing me the strip function 😄
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.