DATA Step, Macro, Functions and more

Parse a string into different variables

Reply
N/A
Posts: 0

Parse a string into different variables

Hi,
I have one variable (example listed below) needs to be parsed into multiple variables. For example, the first listed (F1) 172/233 needs to be separated as 3 variables: F1, 172 and 233. The last listed M(850)84 F(832)9/358 needs to be separated as 6 variables: M850, 84, F832, 9 and 358. The length is not fixed. Even I use index function to locate the ( ) it still hard to extract the characters within ( ). Any help will be highly appreciated. Thanks,


(F1) 172/233
(M20) 141
(M279) 256
(F125) 9/87
M(603)90 F(666)10/170
M(850)84 F(832)9/358
Super Contributor
Super Contributor
Posts: 3,174

Re: Parse a string into different variables

Posted in reply to deleted_user
A combination of SCAN or INDEX (to determine length), along with the SUBSTR function is appropriate. Also, I see you will need to use COMPRESS to remove the parenthesis characters, possibly. Remember the SCAN function accepts one or more delimiter characters as the second argument; also scan accepts a negative value to work from the right-most string location - not well documented, I don't believe. And there are times where you may need to make multiple passes of the string/substring to extract the desired pieces.

Also, if you speak Perl, there are some supported Perl-like functions now supported with current SAS.

Scott Barry
SBBWorks, Inc.
Respected Advisor
Posts: 4,173

Re: Parse a string into different variables

Hi yy1234

Something like that:

data _null_;
infile datalines missover;
input String $50.;
put String=;
array StringPart {100} $100.;
i=1;
do while(scan(String,i,') /') ne '');
StringPart{i}=scan(String,i,') /');
StringPart{i}=compress(StringPart{i},'(');
put StringPart{i}=;
i+1;
end;
datalines;
(F1) 172/233
(M20) 141
(M279) 256
(F125) 9/87
M(603)90 F(666)10/170
M(850)84 F(832)9/358
;

You could also first pass the data set and determine the necessary number of array elements before reading the "words" into separate variables.
I've chosen 100 array elements.
What could work to do it more dynamic is: length() of string - length() of string after compression of all delimiters - and then keeping the max value of this difference (call symput).
Using this value (in macrovar) in array definition..... just an idea.

And yes: There would also be some Perl RegEx which could be helpful - but I think in your case they wouldn't provide a simpler solution.

Cheers
Patrick
N/A
Posts: 0

Re: Parse a string into different variables

Posted in reply to deleted_user
Hi Scott & Patrick,
Thanks for all the suggestions. I tried the following. It works. gen_orig is the original variable I listed before.

gentry=tranwrd(gen_orig, "F(", "(F");
gentry2=tranwrd(gentry, "M(", "(M");
genvar1=scan(gentry2, 1 );
genvar2=scan(gentry2, 2);
genvar3=scan(gentry2, 3);
genvar4=scan(gentry2, 4);
genvar5=scan(gentry2, 5);

Best regards,
Ying
Ask a Question
Discussion stats
  • 3 replies
  • 175 views
  • 0 likes
  • 3 in conversation