Adding one more reply to the mix... data one; length line $100; infile cards dlm="|"; input line; cards; John Q. Matthew 40 $90,000 George Wilson 28 $50,000 Robert Nicolas 30 $60,000 Leo Thomas 35 $70,000 Thurston Howell The 3rd 65 $1,000,000 Thurston Howell The 3 rd 65 $1,000,000 ; run; data two; length name age inc $40; set one; rx=prxparse("/^(.*?)(\d+)\s+(.*)$/"); if prxmatch(rx,line) then do; name=prxposn(rx,1,line); * age=input(prxposn(rx,2,line),best.); age=prxposn(rx,2,line); * inc=input(prxposn(rx,3,line),dollar18.); inc=prxposn(rx,3,line); inc2=input(inc,dollar18.); end; format inc2 dollar18.; run; The regular expression in prxparse is: 1) Lazy capture all characters anchored at the start of the line until you encounter one or more digits separated by one or more spaces. That's why the first "Thurston Howell" line worked but the second one didn't. 2) Capture the first encounter of one or more digits followed by one or more spaces. The lazy capture (.*?) ensures you stop at the first encounter of the digits. Since dot (.) matches ANY character, a greedy capture would capture until the last encounter of digit + a space. Probably not an issue with this data, but still, I think a lazy capture is best here. 3) Greedy capture all characters after the digit plus one or more spaces until the end of the line. I left name, age, and inc as character values for debugging. Once you're happy with the parsing, you can change the data type and nest the prxposn calls inside the input calls. HTH, Scott P.S.: Can someone explain how you embed the SAS formatted code (color, etc.) into these messages?
... View more