Hi everyone.
May I please ask how to I get from dataset A to B?
Your help is greatly appreciated!
A:
id string
1 8"A",1,2,8"C"
2 9"Q",3
3 4,5,9"P",9"Z",8"3/7"
B:
id string comment8 comment9
1 8"A",1,2,8"C" A. C
2 9"Q",3 Q
3 4,5,9"P",9"Z",8"3/7" 3/7 P. Z
Hi, K.C.
This should get you started.
Tom
data have;
length String $255;
input ID String;
cards;
1 8"A",1,2,8"C"
2 9"Q",3
3 4,5,9"P",9"Z",8"3/7"
run;
data inter1(drop = _:);
length _word CommentText $255;
set have;
do _i = 1 to 255 until(missing(_word));
_word = scan(String, _i, ",");
if find(_word, '"') then
do;
CommentNumber = input(scan(_word, 1, '"'), best15.);
CommentText = scan(_word, 2, '"');
output;
end;
end;
run;
proc sort data=inter1 out=inter2;
by ID CommentNumber;
run;
data inter3;
length ConsolidatedCommentText $255;
retain ConsolidatedCommentText;
set inter2;
by ID CommentNumber;
if first.CommentNumber then
call missing(ConsolidatedCommentText);
if missing(ConsolidatedCommentText) then
ConsolidatedCommentText = CommentText;
else ConsolidatedCommentText = catx(".", ConsolidatedCommentText, CommentText);
if last.CommentNumber then
output;
run;
proc transpose data=inter3 out=want(drop = _NAME_) prefix=comment;
by ID String;
id CommentNumber;
var ConsolidatedCommentText;
run;
Using PRX string matching functions :
data have;
length string $64;
input id string &;
datalines;
1 8"A",1,2,8"C"
2 9"Q",3
3 4,5,9"P",9"Z",8"3/7"
;
data want;
set have;
retain prx8 prx9;
length comment8 comment9 $32;
if _n_=1 then do;
prx8 = prxparse('/\b8"[^"]+"/');
prx9 = prxparse('/\b9"[^"]+"/');
end;
start = 1;
do until (position=0);
call prxnext(prx8, start, -1, string, position, length);
comment8 = catx(". ", comment8, substrn(string, position+2, max(0,length-3)));
end;
start = 1;
do until (position=0);
call prxnext(prx9, start, -1, string, position, length);
comment9 = catx(". ", comment9, substrn(string, position+2, max(0,length-3)));
end;
drop start position length prx8 prx9;
run;
proc print; run;
PG
You can scan the string using comma as a delimiter. You can then scan the individual values (or "words") to find index number and the comment value.
The code below will work when the prefix on the words are single digit numbers. If the values could be larger than 9 then you will need to change the array and also the method used to convert the prefix on the word to an index value. If you are positive that prefix values 0 to 7 could never have any comments then it would probably just be easiest to add a drop statement for those variables.
data want ;
set have ;
array comment (0:9) $40 comment0-comment9 ;
do i=1 to countw(string,',');
word = scan(string,i,',');
index= input(word,1.);
comment(index) = catx('. ',comment(index),scan(word,2,'"'));
end;
drop i word index ;
run;
How about this:
data have; length String $255; input ID String; cards; 1 8"A",1,2,8"C" 2 9"Q",3 3 4,5,9"P",9"Z",8"3/7" ; run; data temp(keep=id a b ); set have; do i=1 to countw(string,','); value=scan(string,i,','); a=scan(value,1,'"'); b=scan(value,2,'"'); if find(value,'"') then output; end; run; proc sort data=temp; by id a; run; data x(keep=id a x); set temp; by id a; length x $ 40; retain x; x=catx('.',x,b); if last.a then do; output; call missing(x); end; run; proc transpose data=x out=want(drop=_name_) prefix=comment; by id; id a; var x; run;
Ksharp
Thanks a lot everyone for your help!
Great to know there are many different ways to achieve this.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.