Hi everyone.
Can I please ask your help on this because I have racked my brain for hours to solve this but sadly I still can't figure out how
Your help would be very much appreciated
I wish to produce data 'Want' based on variable 'Comment' and 'ID' from data 'Have'.
Data 'Have'
ID=1, Comment='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs'
ID=2, Comment='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo'
Data 'Want'
ID Question Comment
1 qwertyuiop
1 Q2 asdfgh.
1 Q35 jkl
1 Q88 zxcvbnm
1 Q151 plmkjh
1 Q66 rfevs
2 Q98 ertyu
2 Q77 asdfgh.
2 Q88 zvbq
2 Q111 mjq
2 Q42 rvs
2 Q1 plo
I couldn't come up with something simpler than in below code and it will eventually be a bit hard to digest in case you're not familiar with Regular Expressions ( http://support.sas.com/resources/papers/proceedings16/2480-2016.pdf )
data Have;
ID=1;
text='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs';
output;
ID=2;
text='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo';
output;
text=' ';
output;
run;
data want(drop=_:);
if _n_=1 then
do;
retain _prxid_q;
_prxid_q=prxparse('/q\d+/oi');
end;
set have;
if prxmatch('/^\s*q\d/i',text) ne 1 and lengthn(text) > 0 then _text='Q9999'||text;
else _text=text;
_start=1;
_stop=length(_text);
length question $5 answer $20;
call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
do while (_position > 0);
question = substr(_text, _position, _length);
_start_answer=sum(_position,lengthn(question));
call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
if _position>0 then
do;
answer=substrn(_text,_start_answer,_position-_start_answer);
end;
else
do;
answer=substrn(_text,_start_answer);
end;
output;
end;
run;
Well, the problem you are facing is that there doesn't appear to be any logical pattern to the comment. For instance, you can get some way there by using scan, see example below, however Q isn't constant, it is sometimes, and that could also be elsewhere in the text. Othertimes its not even present. This is why most programmers will refuse to process comments data, you may be quicker just manually going through the data yourself.
Data Have; ID=1; Comment='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs'; output; ID=2; Comment='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo'; output; run; data want; set have; do i=1 to countw(comment,"Q"); section=scan(comment,i,"Q"); output; end; run;
So you can use a capital Q as the separator?
In the set statement of the data step, rename comment to oldcomment.
Define a suitable length for comment
For the first record, I'd set question to blank, and comment to scan(oldcomment,1,'Q')
Then remove the first part by setting oldcomment to substr(oldcomment,index(oldcomment,'Q')+1)
Then in do a while loop (check for the presence of Q in the condition):
- scan for no-number characters at the beginning of the string with notdigit(); this lets you extract the question number
- then, substr() everything after the number up to the next Q
- output
- do substr(oldcomment,index(oldcomment,'Q')+1) once again
- after the loop has finished, process the remainder of oldcomment just like in the loop
- keep only id, question and comment
I couldn't come up with something simpler than in below code and it will eventually be a bit hard to digest in case you're not familiar with Regular Expressions ( http://support.sas.com/resources/papers/proceedings16/2480-2016.pdf )
data Have;
ID=1;
text='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs';
output;
ID=2;
text='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo';
output;
text=' ';
output;
run;
data want(drop=_:);
if _n_=1 then
do;
retain _prxid_q;
_prxid_q=prxparse('/q\d+/oi');
end;
set have;
if prxmatch('/^\s*q\d/i',text) ne 1 and lengthn(text) > 0 then _text='Q9999'||text;
else _text=text;
_start=1;
_stop=length(_text);
length question $5 answer $20;
call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
do while (_position > 0);
question = substr(_text, _position, _length);
_start_answer=sum(_position,lengthn(question));
call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
if _position>0 then
do;
answer=substrn(_text,_start_answer,_position-_start_answer);
end;
else
do;
answer=substrn(_text,_start_answer);
end;
output;
end;
run;
Dear @RW9 @Kurt_Bremser @Patrick.
Thank you very much for your help.
Yes, Patrick. PERL is indeed completely new to me but I will definately try to self-learn now.
Have an awesome weekend!
I love this question. data Have; ID=1; text='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs'; output; ID=2; text='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo'; output; run; data temp; set have; length temp $ 400; n=0;group=0; pid=prxparse('/q\d+:?/io'); s=1; e=length(text); call prxnext(pid,s,e,text,p,l); do while(p gt 0); n+1; if n=1 then do; if p ne 1 then do;group+1;temp=' ';output;temp=substr(text,1,p-1);output;end; group+1;temp=compress(substr(text,p,l),':');output; end; else do; temp=substr(text,lag_p+lag_l,p-lag_p-lag_l);output; group+1;temp=compress(substr(text,p,l),':');output; end; lag_p=p;lag_l=l; call prxnext(pid,s,e,text,p,l); if p le 0 then do;temp=substr(text,lag_p+lag_l);output;end; end; keep id temp group; run; proc transpose data=temp out=want; by id group; var temp; run;
And a completely different approach:
Go to who ever created that mess a see what options for other data formats there may have been. This appears to be exported from some sort of data entry or survey software and suspect some one was lazy or did not pay attention to options when extracting data.
@ballardw, best advice of the thread. In a proper process you could go back to the agreed upon Data Transfer Document, but it seems like nobody bothers with silliness such as documented, validated, repeatable procedures anymore. Just dump any old garbage into Excel and jobs a good one.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.