BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Miracle
Barite | Level 11

Hi everyone.

Can I please ask your help on this because I have racked my brain for hours to solve this but sadly I still can't figure out how Smiley Sad

Your help would be very much appreciated Smiley Happy

I wish to produce data 'Want' based on variable 'Comment' and 'ID' from data 'Have'.

Data 'Have'
ID=1, Comment='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs'
ID=2, Comment='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo'

Data 'Want'
ID Question   Comment
1             qwertyuiop
1  Q2         asdfgh.
1  Q35        jkl 
1  Q88        zxcvbnm
1  Q151       plmkjh
1  Q66	      rfevs	
2  Q98        ertyu
2  Q77        asdfgh. 
2  Q88        zvbq
2  Q111       mjq
2  Q42	      rvs 
2  Q1	      plo

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

I couldn't come up with something simpler than in below code and it will eventually be a bit hard to digest in case you're not familiar with Regular Expressions ( http://support.sas.com/resources/papers/proceedings16/2480-2016.pdf )

data Have;
  ID=1;
  text='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs';
  output;
  ID=2;
  text='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo';
  output;
  text='  ';
  output;
run;

data want(drop=_:);
  if _n_=1 then
    do;
      retain _prxid_q;
      _prxid_q=prxparse('/q\d+/oi');
    end;

  set have;

  if prxmatch('/^\s*q\d/i',text) ne 1 and lengthn(text) > 0 then _text='Q9999'||text;
  else _text=text;

  _start=1;
  _stop=length(_text);
  length question $5 answer $20;

  call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
  do while (_position > 0);
    question = substr(_text, _position, _length);
    _start_answer=sum(_position,lengthn(question));

    call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
    if _position>0 then
      do;
        answer=substrn(_text,_start_answer,_position-_start_answer);
      end;
      else
      do;
        answer=substrn(_text,_start_answer);
      end;
    output;
  end;

run;

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, the problem you are facing is that there doesn't appear to be any logical pattern to the comment.  For instance, you can get some way there by using scan, see example below, however Q isn't constant, it is sometimes, and that could also be elsewhere in the text.  Othertimes its not even present.  This is why most programmers will refuse to process comments data, you may be quicker just manually going through the data yourself.

Data Have;
ID=1; Comment='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs'; output;
ID=2; Comment='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo'; output;
run;

data want;
  set have;
  do i=1 to countw(comment,"Q");
    section=scan(comment,i,"Q");
    output;
  end;
run;
Kurt_Bremser
Super User

So you can use a capital Q as the separator?

In the set statement of the data step, rename comment to oldcomment.

Define a suitable length for comment

For the first record, I'd set question to blank, and comment to scan(oldcomment,1,'Q')

Then remove the first part by setting oldcomment to substr(oldcomment,index(oldcomment,'Q')+1)

Then in do a while loop (check for the presence of Q in the condition):

- scan for no-number characters at the beginning of the string with notdigit(); this lets you extract the question number

- then, substr() everything after the number up to the next Q

- output

- do substr(oldcomment,index(oldcomment,'Q')+1) once again

- after the loop has finished, process the remainder of oldcomment just like in the loop

- keep only id, question and comment

Patrick
Opal | Level 21

I couldn't come up with something simpler than in below code and it will eventually be a bit hard to digest in case you're not familiar with Regular Expressions ( http://support.sas.com/resources/papers/proceedings16/2480-2016.pdf )

data Have;
  ID=1;
  text='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs';
  output;
  ID=2;
  text='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo';
  output;
  text='  ';
  output;
run;

data want(drop=_:);
  if _n_=1 then
    do;
      retain _prxid_q;
      _prxid_q=prxparse('/q\d+/oi');
    end;

  set have;

  if prxmatch('/^\s*q\d/i',text) ne 1 and lengthn(text) > 0 then _text='Q9999'||text;
  else _text=text;

  _start=1;
  _stop=length(_text);
  length question $5 answer $20;

  call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
  do while (_position > 0);
    question = substr(_text, _position, _length);
    _start_answer=sum(_position,lengthn(question));

    call prxnext(_prxid_q, _start, _stop, _text, _position, _length);
    if _position>0 then
      do;
        answer=substrn(_text,_start_answer,_position-_start_answer);
      end;
      else
      do;
        answer=substrn(_text,_start_answer);
      end;
    output;
  end;

run;
Miracle
Barite | Level 11

Dear @RW9 @Kurt_Bremser @Patrick

Thank you very much for your help.

Yes, Patrick. PERL is indeed completely new to me but I will definately try to self-learn now.

Have an awesome weekend!

Ksharp
Super User
I love this question.


data Have;
  ID=1;
  text='qwertyuiopQ2asdfgh.Q35 jkl Q88zxcvbnm Q151: plmkjhq66 rfevs';
  output;
  ID=2;
  text='Q98ertyu Q77: asdfgh.Q88zvbq q111: mjqq42 rvsq1plo';
  output;
run;

data temp;
 set have;
 length temp $ 400;
 n=0;group=0;
 pid=prxparse('/q\d+:?/io');
 s=1;
 e=length(text);
 call prxnext(pid,s,e,text,p,l);
 do while(p gt 0);
 n+1;
  if n=1 then do;
    if p ne 1 then do;group+1;temp=' ';output;temp=substr(text,1,p-1);output;end;
    group+1;temp=compress(substr(text,p,l),':');output;
  end;
  else do;
   temp=substr(text,lag_p+lag_l,p-lag_p-lag_l);output;
   group+1;temp=compress(substr(text,p,l),':');output;
  end;
  
  lag_p=p;lag_l=l;
  call prxnext(pid,s,e,text,p,l);
 
  if p le 0 then do;temp=substr(text,lag_p+lag_l);output;end;
 
 end;
keep id temp group;
run;
proc transpose data=temp out=want;
 by id group;
 var temp;
run;
  

ballardw
Super User

And a completely different approach:

Go to who ever created that mess a see what options for other data formats there may have been. This appears to be exported from some sort of data entry or survey software and suspect some one was lazy or did not pay attention to options when extracting data.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

@ballardw, best advice of the thread.  In a proper process you could go back to the agreed upon Data Transfer Document, but it seems like nobody bothers with silliness such as documented, validated, repeatable procedures anymore.  Just dump any old garbage into Excel and jobs a good one.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1328 views
  • 6 likes
  • 6 in conversation