Re: data manipulation - string concatenation

Miracle · Posted 04-20-2013 06:21 AM

Hi everyone.

May I please ask how to I get from dataset A to B?

Your help is greatly appreciated!

A:

id string

1 8"A",1,2,8"C"

2 9"Q",3

3 4,5,9"P",9"Z",8"3/7"

B:

id string comment8 comment9

1 8"A",1,2,8"C" A. C

2 9"Q",3 Q

3 4,5,9"P",9"Z",8"3/7" 3/7 P. Z

TomKari · Posted 04-20-2013 12:18 PM

Hi, K.C.

This should get you started.

Tom

data have;
length String $255;
input ID String;
cards;
1     8"A",1,2,8"C"
2     9"Q",3
3     4,5,9"P",9"Z",8"3/7"
run;

data inter1(drop = _:);
length _word CommentText $255;
set have;

do _i = 1 to 255 until(missing(_word));
_word = scan(String, _i, ",");

if find(_word, '"') then
   do;
    CommentNumber = input(scan(_word, 1, '"'), best15.);
    CommentText = scan(_word, 2, '"');
    output;
   end;
end;
run;

proc sort data=inter1 out=inter2;
by ID CommentNumber;
run;

data inter3;
length ConsolidatedCommentText $255;
retain ConsolidatedCommentText;
set inter2;
by ID CommentNumber;

if first.CommentNumber then
call missing(ConsolidatedCommentText);

if missing(ConsolidatedCommentText) then
ConsolidatedCommentText = CommentText;
else ConsolidatedCommentText = catx(".", ConsolidatedCommentText, CommentText);

if last.CommentNumber then
output;
run;

proc transpose data=inter3 out=want(drop = _NAME_) prefix=comment;
by ID String;
id CommentNumber;
var ConsolidatedCommentText;
run;

PGStats · Posted 04-20-2013 12:52 PM

Using PRX string matching functions :

data have;
length string $64;
input id string &;
datalines;
1     8"A",1,2,8"C"
2     9"Q",3
3     4,5,9"P",9"Z",8"3/7"
;

data want;
set have;
retain prx8 prx9;
length comment8 comment9 $32;
if _n_=1 then do;
     prx8 = prxparse('/\b8"[^"]+"/');
     prx9 = prxparse('/\b9"[^"]+"/');
     end;
start = 1;
do until (position=0);
     call prxnext(prx8, start, -1, string, position, length);
     comment8 = catx(". ", comment8, substrn(string, position+2, max(0,length-3)));
     end;
start = 1;
do until (position=0);
     call prxnext(prx9, start, -1, string, position, length);
     comment9 = catx(". ", comment9, substrn(string, position+2, max(0,length-3)));
     end;
drop start position length prx8 prx9;
run;

proc print; run;

PG

Tom · Posted 04-20-2013 01:47 PM

You can scan the string using comma as a delimiter. You can then scan the individual values (or "words") to find index number and the comment value.

The code below will work when the prefix on the words are single digit numbers. If the values could be larger than 9 then you will need to change the array and also the method used to convert the prefix on the word to an index value. If you are positive that prefix values 0 to 7 could never have any comments then it would probably just be easiest to add a drop statement for those variables.

data want ;

set have ;

array comment (0:9) $40 comment0-comment9 ;

do i=1 to countw(string,',');

word = scan(string,i,',');

index= input(word,1.);

comment(index) = catx('. ',comment(index),scan(word,2,'"'));

end;

drop i word index ;

run;

Ksharp · Posted 04-21-2013 12:11 AM

How about this:

data have;
length String $255;
input ID String;
cards;
1     8"A",1,2,8"C"
2     9"Q",3
3     4,5,9"P",9"Z",8"3/7"
;
run;

data temp(keep=id a b );
 set have;
  do i=1 to countw(string,',');
   value=scan(string,i,',');
   a=scan(value,1,'"');
   b=scan(value,2,'"');
   if find(value,'"') then output;
  end;
run;
proc sort data=temp;
 by id a;
run;
data x(keep=id a x);
 set temp;
 by id a;
 length x $ 40;
 retain x;
 x=catx('.',x,b);
 if last.a then do;
                  output;
                      call missing(x);
                    end;
run;
proc transpose data=x out=want(drop=_name_) prefix=comment;
 by id;
 id a;
 var x;
run;

Ksharp

Miracle · Posted 04-21-2013 05:26 AM

Thanks a lot everyone for your help!

Great to know there are many different ways to achieve this.

data manipulation - string concatenation