Since you are not going to have a consistent number of "tokens", suggest you parse using the SCAN function, and then based on known possible "last" values, you can decide how to identify a FIRST, LAST and maybe MIDDLE. Again, for this work-objective, your friend will be the DATA step and the SCAN function with some number of token-substring (declared as maybe $20 each) variables.
As elegant as the PRX-only solution is, I agree with Scott that sometimes a more verbose or step-wise solution might be easier to maintain. For example, compare the 7 statements of the PRXCHANGE/REVERSE/SCAN solution with the 5 statements in the PRX-only solution.
If I am a PRX newbie and my ever inventive data entry folks throw a few curve balls like Dr. Casey (no first name) or Dr. First Last, Sr. MD or Dr. First Last, III MD I have a better chance of -successfully- adjusting my code using the hybrid approach. If I'm not a PRX newbie, then adjusting the regex will probably be easy.
length name $30;
infile datalines dsd dlm=',';
input idnum name $;
1,"Dr. Smith T. Bauer MD"
2,"Samuel I Rodriguez M.D."
3,"Will Glader MD"
4,"Dr. Greg House"
5,"Dr Drake Morgan"
6,"DR Donnie Darko, Sr. MD"
length first last z revname zz $30;
** get rid of Dr, if present;
z = left(prxchange('s/(Dr |Dr. |DR |DR. )/ /',1, name));
** get rid of periods and commas;
z = compress(z,',.');
** reverse the string so that MD, Jr, Sr is first, if present;
revname = reverse(z);
** change MD, Jr, Sr (in reverse) to spaces;
zz = left(prxchange('s/(DM |rJ |rS )/ /',2, revname ));
** now first name is always the first chunk of the string;
first = scan(z,1,' ');
** and last name is always the first chunk of the reversed string;
** but the scanned string has to be reversed again to be correct;
last = reverse(scan(zz,1,' '));
** If first name and last name are the same (Dr. Casey), then set ;
** spaces/missing for first name;
if first = last then first = ' ';
** entire PRX solution;
* prepare reg. expression for text parsing;
* split parsed text into desired variables;
proc print data=parsename;
var first last name z revname zz alt_first alt_last;
OK, regular expressions may not be something you learn on the fly, but it's actually something very logic and worth spending sometime learning (and I must say, my knowledge in the field is pretty average, and I manage myself to successfully produce one for the above question).
As I see it, string manipulation functions (scan, index, reverse, substr, etc) are quite good and will do the job for most of the needs. But in some cases, where transformation gets more complicated, those will produce a very "messy", difficult to understand and maintain code. Regular expressions will give you the same number of lines, and actually the same code, either you want to do some simple or more complex transformation. You just need to code the right expression. And it's been there for years (at leas in the UNIX world) so there's a lot of documentation, just one google away!
Regular expression are to my eyes one of the greatest new features of the SAS 9 platform, one that really adds value to the SAS language.
So, do not be afraid of the "infamous" PRX functions, they won't bite you.
But, then again, I believe any solution is a valid, if the result is the one expected.
In principle, I agree with you. However, figuring out PRX expressions gives me a headache .. even with Google ... and reminds me of my old symbolic logic class, which also gave me headaches. I passed the course, but it was the hardest class I ever took.
To give Perl regular expressions their due: I -love- PRXCHANGE and if anything is going to lead me to understand the rest of Perl Regular Expressions, PRXCHANGE will get me there....but not today!
I still take the view that a public "text definition standard" that I presume Perl regular expressions provide, gives our SAS languages something on which to build the "inPicture" equivalent of "inValue" (at http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a002473466.htm see proc format documantation for inValue).
Although I know a clever little birdie, in his "developers sandpit", has toyed with, and demonstrated techniques using perl regular expressions in the "start" string for an enhanced inValue statement, there does not seem to be enough "customer demand" behind this concept of better integration of regular expression within the SAS languages to create the prioity or budget for development and implementation of this logical extension of SAS language inFormats.