DATA Step, Macro, Functions and more

substr function question

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

substr function question

Good afternoon everyone. I have two questions about the substring function. I'm looking at a Base SAS certification study guide and there are two questions I got wrong. Additionally, this particular study guide does not explain the answers. The first question pertains to the following code:

 

data test1;

    first="Ipswich, England";

    City=substr(first,1,7);

    City_Country=City!!", "!!"England";

run;

 

Why wouldn't the value for City_Country be "Ipswich, England."? I thought the substr was taking out 7 characters from the first observation which would be Ipswich. However, the value ends up being "Ipswich , England"; notice the space after "Ipswich."

 

My second question pertains to the following code:

 

data test1;

    first="Ipswich, England";

    City_Country=substr(first,1,7)!!", "!!"England";

run;

 

Why wouldn't the length of City_Country be 16? In the previous example, the result of City_Country is "Ipswich , England" which is 16 characters. Shouldn't the value of City_Country be the same in this example. The length ends up being 25 in this example.


Accepted Solutions
Solution
‎03-04-2017 02:31 PM
PROC Star
Posts: 7,363

Re: substr function question

If you look carefully at the results of your first test, there is more than one space after Ipswich..

 

Since you didn't specify a length for City, it takes the length from First (i.e., 16). Then, when you create City_Country, it takes the length of City (16)+2 (the comma and space)+7 (the length of 'England")=25. 

 

City_Country gets a value of Ipswich + the 9 spaces that exist to the right of the value Ipswich + the comma and space + the value 'England'

 

In your second example, it use same logic to determine the length, however it only concatenates the first seven characters from first, adds the comma and space, followed by 'England'

  

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
Solution
‎03-04-2017 02:31 PM
PROC Star
Posts: 7,363

Re: substr function question

If you look carefully at the results of your first test, there is more than one space after Ipswich..

 

Since you didn't specify a length for City, it takes the length from First (i.e., 16). Then, when you create City_Country, it takes the length of City (16)+2 (the comma and space)+7 (the length of 'England")=25. 

 

City_Country gets a value of Ipswich + the 9 spaces that exist to the right of the value Ipswich + the comma and space + the value 'England'

 

In your second example, it use same logic to determine the length, however it only concatenates the first seven characters from first, adds the comma and space, followed by 'England'

  

Art, CEO, AnalystFinder.com

Contributor
Posts: 27

Re: substr function question

Once again you completely answered my question. Thank you Art!

Contributor
Posts: 27

Re: substr function question

Art, after playing around with concatenation more, I noticed some strange results. For the following code, the value of concat1 has a lot of spaces between first and second. Upon further inspection, I found that the length of first is 4 and the length of second is 12. Thus, second is the culprit. I added the line "length second 4." before "second=1001" and still the length ends up being 12. The only way I found to fix this is using the "strip" function around second within the concat1 assignment statement. Why does it set the defualt of this numeric as 12? I thought numerics have a default of 8 bytes. Additionally, why can't I change the length with a length statement?

 

data concat2;
first="1568";
X=length(first);
second=1001;
Y=length(second);
concat1=first||second;
Z=length(concat1);
run;

PROC Star
Posts: 7,363

Re: substr function question

Not strange. In your latest example you're concatenating a string with a length of 4 with a number. When SAS converts a number to a character string it uses 12 bytes to represent the number. If the number has more than 12 characters, it is converted to an E form to fit into the 12 characters.

 

If you concatenated second with second it would result in the variable having a length of 24.

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 27

Re: substr function question

This makes sense. Thank you!

Super User
Super User
Posts: 6,502

Re: substr function question

The variable SECOND in your program is a number. So it's default length would by 8, but since it just contains an integer cutting the length in half to 4 would not change how the number is stored.

 

The reason that it ends up taking 12 characters in your generated string is because SAS will use the BEST12. format to convert the number SECOND into a string. It needs to do this since you used in a place where a character value is expected in the LENGHT() function and with the concatenate operator.  

 

If you wanted to convert it to a four character string then you could use the PUT() function.

 

concat2=first||put(second,4.);
Z2=length(concat2));
Trusted Advisor
Posts: 1,395

Re: substr function question

The default length of a variable depends on its type and SAS platform;

As you did not define length for CITY, it is probably greater then 7, therfore is the space between preceeding the comma.

In order to get the format 'city, country' use the strip function or catx function:

data test1;
    first="Ipswich, England";
    City=substr(first,1,7);
    City_Country=strip(City) !! ", " !! "England";
run;
/***** OR ****/
data test1;
    first="Ipswich, England";
    City=substr(first,1,7);
    City_Country=catx(', ',City, "England");
run;

 

Contributor
Posts: 27

Re: substr function question

Thank you for showing me the strip function Smiley Very Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 171 views
  • 0 likes
  • 4 in conversation