- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I have a string that have 3 words with two commas between them.
I want to calculated 3 fields- continent, city,country.
Can anyone explain why City=scan(X,-2,","); result with city?
In this case there are 2 commas, to which comma this code refer to? to first or second?
As I understand since -2 is negative then it search for the word in right side of the comma?
data have;
input X $20.;
cards;
EU,Athens,Greece
Asia,Beijing,China
;
run;
data want;
set have;
continent=scan(X,1,",");/**EU**/
City=scan(X,-2,",");/**Athens**/
Country=scan(X,-1,",");/**Greece**/
Run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You don't have the doc?
It's on www as well.
- If count is positive, SCAN counts words from left to right in the character string.
- If count is negative, SCAN counts words from right to left in the character string.
Functions and CALL Routines
SCAN Function
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My question was what happens if there are multiple delimiters? Then which delimiter is the reference? The first one from left side?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If there are multiple delimiters, they are all reduced to a common delimiter (also in the strings to scan) and then the scan is performed on that common delimiter. I don't know if that's really how it happens in the background (probably not), but that's how to think about it.
data have;
input X $20.;
cards;
EU,Athens#Greece
Asia#Beijing,China
;
run;
data want;
set have;
continent = scan(X, 1,",#");/**EU**/
City = scan(X,-2,",#");/**Athens**/
Country = scan(X,-1,",#");/**Greece**/
run;
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Then SAS will take them as ONE delimiter.
Except you specify the fourth argument ('m') in SCAN() :
continent=scan(X,1,"," , 'm');
You could make seveal examples to see how they are different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ah oké, that's what @Ronein means with multiple delimiters.
Indeed ... answer provided by @Ksharp in that case.
Here's the doc about that m (or M) modifier:
m or M | specifies that multiple consecutive delimiters, and delimiters at the beginning or end of the string argument, refer to words that have a length of zero. If the M modifier is not specified, then multiple consecutive delimiters are treated as one delimiter, and delimiters at the beginning or end of the string argument are ignored. |
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My question was what happens if there are multiple delimiters? Then which delimiter is the reference? The first one from left side?
Try it and see.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ronein wrote:
My question was what happens if there are multiple delimiters? Then which delimiter is the reference? The first one from left side?
It is not referencing the delimiters. It is referencing the words that are formed by splitting the string at the delimiters. Consider a string with 5 words. Here are some example 5 word strings when the comma is the delimiter.
one,two,three,four,five
This sentence,is divided,into,five,parts
You can either count from left to right but using positive indexes. Or count from right to left by using negative indexes.
Word Positive Negative one 1 -5 two 2 -4 three 3 -3 four 4 -2 five 5 -1