Desktop productivity for business analysts and programmers

SCAN(SUBSTRING)

Accepted Solution Solved
Reply
Contributor
Posts: 52
Accepted Solution

SCAN(SUBSTRING)

How do I write a function that will search for a character within a string then pull the 6 digits after the character including the character?

Ex:

0000M123450000

11M12345898989

222222m1234500

0M12345547

Each data line is different in length, but the output (work) should be

M12345

M12345

M12345

M12345

Keep in mind the character can be lowercase or uppercase.

In this situation the value is "M" and I'm looking for 6 digits including the M.

Thanks in advance.....



Accepted Solutions
Solution
‎04-15-2014 01:38 PM
Grand Advisor
Posts: 17,396

Re: SCAN(SUBSTRING)

Use a combination of find and substring.

You may need to play with the -1 and or 6 to get what you want, but this should be close (untested).

There's also Perl Expressions, but I avoid those Smiley Happy

WANT = substr(word,findc(word, "M", 'TI')-1, 6);

View solution in original post


All Replies
Valued Guide
Posts: 2,174

Re: SCAN(SUBSTRING)

Substr( inputstring, verify( inputstring, '012345679''), 6)

That VERIFY() function returns the position of the first character of the first parameter that is not in the second parameter. So in the example above it points to any non-numeric which means that M.

it is just about the most efficient of these string hunting functiins

(I am told)

Valued Guide
Posts: 2,174

Re: SCAN(SUBSTRING)

It shouldn't need conditional logic like IF or CASE when you are using FINDc or VERIFY.

Just extend the strings like

substr( inputstring !! "      ", verify( inputstring !! "M", '1234567890'), 6 )

Or,

Substr( t1.'PACKAGE ID'n !! '          ', findc(  t1.'PACKAGE ID'n !! 'M', 'ti' ), 9 )

by appending enough blanks and an M to the input string you can guarantee therr enough characters to extract from   and an M will always be found in the dearch    even if it is after the original 'package ID'

Nor should you need that -0 .

Was it 9 or 6 characters you wanted to extract?

Solution
‎04-15-2014 01:38 PM
Grand Advisor
Posts: 17,396

Re: SCAN(SUBSTRING)

Use a combination of find and substring.

You may need to play with the -1 and or 6 to get what you want, but this should be close (untested).

There's also Perl Expressions, but I avoid those Smiley Happy

WANT = substr(word,findc(word, "M", 'TI')-1, 6);

Contributor
Posts: 52

Re: SCAN(SUBSTRING)


Adjusting the value to -0 will account for the first position.

substr(t1.'PACKAGE ID'n,findc(t1.'PACKAGE ID'n, "M", 'TI')-0, 9)

Thanks Everyone - very Helpful Smiley Wink

Grand Advisor
Posts: 9,584

Re: SCAN(SUBSTRING)

But I like Perl Regular Expression.

For every pure programmer like Java , C++ ,they all use it to process string type variables . I recommend you to learn it, you will get a powerful tool .

data have;
input x : $40.;
cards;
0000M123450000
11M12345898989
222222m1234500
0M12345547
;
run;
data want;
 set have;
 xx=prxchange('s/.*([Mm]\d{5}).*/$1/',-1,x);
run;

Xia Keshan

Contributor
Posts: 52

Re: SCAN(SUBSTRING)

Thanks again Gentlemen - i love learning new methods to parse data...

Keshan - I'll take a look at it, but the last time I touched Java was back in my college days.

Keep in mind - I'm new to this SAS thing, but love to learn..............

Grand Advisor
Posts: 9,584

Re: SCAN(SUBSTRING)

You are going to love Perl Regular Expression.

Super Contributor
Posts: 578

Re: SCAN(SUBSTRING)

newcol=upper(substring(oldcol,find(oldcol,'m',i),6));

Super User
Super User
Posts: 6,364

Re: SCAN(SUBSTRING)

You might need to use IF/THEN logic to handle cases that do not meet the pattern.

For this simple case of only one letter you could use the INDEXC() function to handle matching case.

loc = indexc(STRING,'mM');

if 0 < loc <= length(string)-5 then NEW = substr(STRING,loc,6) ;

Super Contributor
Posts: 275

Re: SCAN(SUBSTRING)

It seems that 'M/m' is only alpha character in string, so:

substr(string,anyalpha(string),6)

Respected Advisor
Posts: 4,995

Re: SCAN(SUBSTRING)

I hate to add to answers that are already plentiful and working, but this approach seemed interesting:

length newvar $ 6;

newvar = 'M' || scan(word, 2, 'Mm');

It will, however, return "M" with no digits following if the incoming word doesn't contain any M/M.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 502 views
  • 6 likes
  • 8 in conversation