BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smw10
Fluorite | Level 6

Good afternoon everyone. I have two questions about the substring function. I'm looking at a Base SAS certification study guide and there are two questions I got wrong. Additionally, this particular study guide does not explain the answers. The first question pertains to the following code:

 

data test1;

    first="Ipswich, England";

    City=substr(first,1,7);

    City_Country=City!!", "!!"England";

run;

 

Why wouldn't the value for City_Country be "Ipswich, England."? I thought the substr was taking out 7 characters from the first observation which would be Ipswich. However, the value ends up being "Ipswich , England"; notice the space after "Ipswich."

 

My second question pertains to the following code:

 

data test1;

    first="Ipswich, England";

    City_Country=substr(first,1,7)!!", "!!"England";

run;

 

Why wouldn't the length of City_Country be 16? In the previous example, the result of City_Country is "Ipswich , England" which is 16 characters. Shouldn't the value of City_Country be the same in this example. The length ends up being 25 in this example.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

If you look carefully at the results of your first test, there is more than one space after Ipswich..

 

Since you didn't specify a length for City, it takes the length from First (i.e., 16). Then, when you create City_Country, it takes the length of City (16)+2 (the comma and space)+7 (the length of 'England")=25. 

 

City_Country gets a value of Ipswich + the 9 spaces that exist to the right of the value Ipswich + the comma and space + the value 'England'

 

In your second example, it use same logic to determine the length, however it only concatenates the first seven characters from first, adds the comma and space, followed by 'England'

  

Art, CEO, AnalystFinder.com

View solution in original post

8 REPLIES 8
art297
Opal | Level 21

If you look carefully at the results of your first test, there is more than one space after Ipswich..

 

Since you didn't specify a length for City, it takes the length from First (i.e., 16). Then, when you create City_Country, it takes the length of City (16)+2 (the comma and space)+7 (the length of 'England")=25. 

 

City_Country gets a value of Ipswich + the 9 spaces that exist to the right of the value Ipswich + the comma and space + the value 'England'

 

In your second example, it use same logic to determine the length, however it only concatenates the first seven characters from first, adds the comma and space, followed by 'England'

  

Art, CEO, AnalystFinder.com

smw10
Fluorite | Level 6

Once again you completely answered my question. Thank you Art!

smw10
Fluorite | Level 6

Art, after playing around with concatenation more, I noticed some strange results. For the following code, the value of concat1 has a lot of spaces between first and second. Upon further inspection, I found that the length of first is 4 and the length of second is 12. Thus, second is the culprit. I added the line "length second 4." before "second=1001" and still the length ends up being 12. The only way I found to fix this is using the "strip" function around second within the concat1 assignment statement. Why does it set the defualt of this numeric as 12? I thought numerics have a default of 8 bytes. Additionally, why can't I change the length with a length statement?

 

data concat2;
first="1568";
X=length(first);
second=1001;
Y=length(second);
concat1=first||second;
Z=length(concat1);
run;

art297
Opal | Level 21

Not strange. In your latest example you're concatenating a string with a length of 4 with a number. When SAS converts a number to a character string it uses 12 bytes to represent the number. If the number has more than 12 characters, it is converted to an E form to fit into the 12 characters.

 

If you concatenated second with second it would result in the variable having a length of 24.

 

Art, CEO, AnalystFinder.com

 

smw10
Fluorite | Level 6

This makes sense. Thank you!

Tom
Super User Tom
Super User

The variable SECOND in your program is a number. So it's default length would by 8, but since it just contains an integer cutting the length in half to 4 would not change how the number is stored.

 

The reason that it ends up taking 12 characters in your generated string is because SAS will use the BEST12. format to convert the number SECOND into a string. It needs to do this since you used in a place where a character value is expected in the LENGHT() function and with the concatenate operator.  

 

If you wanted to convert it to a four character string then you could use the PUT() function.

 

concat2=first||put(second,4.);
Z2=length(concat2));
Shmuel
Garnet | Level 18

The default length of a variable depends on its type and SAS platform;

As you did not define length for CITY, it is probably greater then 7, therfore is the space between preceeding the comma.

In order to get the format 'city, country' use the strip function or catx function:

data test1;
    first="Ipswich, England";
    City=substr(first,1,7);
    City_Country=strip(City) !! ", " !! "England";
run;
/***** OR ****/
data test1;
    first="Ipswich, England";
    City=substr(first,1,7);
    City_Country=catx(', ',City, "England");
run;

 

smw10
Fluorite | Level 6

Thank you for showing me the strip function 😄

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3430 views
  • 1 like
  • 4 in conversation