How do I write a function that will search for a character within a string then pull the 6 digits after the character including the character?
Ex:
0000M123450000
11M12345898989
222222m1234500
0M12345547
Each data line is different in length, but the output (work) should be
M12345
M12345
M12345
M12345
Keep in mind the character can be lowercase or uppercase.
In this situation the value is "M" and I'm looking for 6 digits including the M.
Thanks in advance.....
Use a combination of find and substring.
You may need to play with the -1 and or 6 to get what you want, but this should be close (untested).
There's also Perl Expressions, but I avoid those
WANT = substr(word,findc(word, "M", 'TI')-1, 6);
Substr( inputstring, verify( inputstring, '012345679''), 6)
That VERIFY() function returns the position of the first character of the first parameter that is not in the second parameter. So in the example above it points to any non-numeric which means that M.
it is just about the most efficient of these string hunting functiins
(I am told)
It shouldn't need conditional logic like IF or CASE when you are using FINDc or VERIFY.
Just extend the strings like
substr( inputstring !! " ", verify( inputstring !! "M", '1234567890'), 6 )
Or,
Substr( t1.'PACKAGE ID'n !! ' ', findc( t1.'PACKAGE ID'n !! 'M', 'ti' ), 9 )
by appending enough blanks and an M to the input string you can guarantee therr enough characters to extract from and an M will always be found in the dearch even if it is after the original 'package ID'
Nor should you need that -0 .
Was it 9 or 6 characters you wanted to extract?
Use a combination of find and substring.
You may need to play with the -1 and or 6 to get what you want, but this should be close (untested).
There's also Perl Expressions, but I avoid those
WANT = substr(word,findc(word, "M", 'TI')-1, 6);
Adjusting the value to -0 will account for the first position.
substr(t1.'PACKAGE ID'n,findc(t1.'PACKAGE ID'n, "M", 'TI')-0, 9)
Thanks Everyone - very Helpful
But I like Perl Regular Expression.
For every pure programmer like Java , C++ ,they all use it to process string type variables . I recommend you to learn it, you will get a powerful tool .
data have; input x : $40.; cards; 0000M123450000 11M12345898989 222222m1234500 0M12345547 ; run; data want; set have; xx=prxchange('s/.*([Mm]\d{5}).*/$1/',-1,x); run;
Xia Keshan
Thanks again Gentlemen - i love learning new methods to parse data...
Keshan - I'll take a look at it, but the last time I touched Java was back in my college days.
Keep in mind - I'm new to this SAS thing, but love to learn..............
You are going to love Perl Regular Expression.
newcol=upper(substring(oldcol,find(oldcol,'m',i),6));
You might need to use IF/THEN logic to handle cases that do not meet the pattern.
For this simple case of only one letter you could use the INDEXC() function to handle matching case.
loc = indexc(STRING,'mM');
if 0 < loc <= length(string)-5 then NEW = substr(STRING,loc,6) ;
It seems that 'M/m' is only alpha character in string, so:
substr(string,anyalpha(string),6)
I hate to add to answers that are already plentiful and working, but this approach seemed interesting:
length newvar $ 6;
newvar = 'M' || scan(word, 2, 'Mm');
It will, however, return "M" with no digits following if the incoming word doesn't contain any M/M.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.