Solved: Re: Scan a string to find word after a specific word

upadhi · Posted 09-17-2020 04:30 AM

string: my aim is to find every word after BANK.upa for every bank.xx in this line bank.ff

output: upa xx ff

How can i achieve the same

Ksharp · Posted 09-17-2020 07:31 AM

data want;
 string = "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff";
 length want $ 80;
pid=prxparse('/(?<=bank_beg\.)\w+/i'); 
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
   want=catx(' ',want,substr(string,p,l));
    call prxnext(pid,s,e,string,p,l);
 end;
 drop s e p l pid;
run;

View solution in original post

yabwon · Posted 09-17-2020 05:30 AM

Mayby with: findw, substr + scan?

data _null_;
  start = 1;
  word = "bank";
  string = "my aim is to find every word after BANK.upa for every bank.xx in this line bank.ff";
  length tmp $ 20 output $ 200;

  do until (pos = 0);
    pos = findw(string, word, ,"SPI", start);
    if pos = 0 then leave;
    tmp = scan(substr(string, pos), 2, ,"SPI");
    start = pos + 1;
    output = catx(" ", output, tmp);
  end;

  put _all_;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

upadhi · Posted 09-17-2020 07:42 AM

hello Bart,

The code works as expected. However, next challenge i faced was, the word i was searching for has an "_"

Example: word="bank_beg"

string: "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff"

The output i get is beg

yabwon · Posted 09-17-2020 07:46 AM

data _null_;
  start = 1;
  word = "bank_beg";
  string = "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff";
  length tmp $ 20 output $ 200;

  do until (pos = 0);
    pos = findw(string, word, ".","SI", start);
    if pos = 0 then leave;
    tmp = scan(substr(string, pos), 2, ".","SI");
    start = pos + 1;
    output = catx(" ", output, tmp);
  end;

  put _all_;
run;

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Ksharp · Posted 09-17-2020 07:31 AM

data want;
 string = "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff";
 length want $ 80;
pid=prxparse('/(?<=bank_beg\.)\w+/i'); 
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
   want=catx(' ',want,substr(string,p,l));
    call prxnext(pid,s,e,string,p,l);
 end;
 drop s e p l pid;
run;

upadhi · Posted 09-17-2020 08:56 AM

Thank you so much for your help. apologies if i am asking too much. but i also need to find out if table we detected was input/output. for example:

here for row 1, UPA11 is output table. for row-7, FSK_HEADER is an input table. , row-8, FSK_Scenario is input table and so on

is there a possible solution to find this out

Ksharp · Posted 09-17-2020 09:18 AM

That is very complicated . You need define many rules .
i.e. out=xxx is output table ,change prx as
pid=prxparse('/(?<=out\=bank_beg\.)\w+/i');

set xxx; is input table, change prx as
pid=prxparse('/(?<=\bset bank_beg\.)\w+/i');

upadhi · Posted 09-17-2020 12:19 PM

hey. yes we thought so, hence, for now dropping that idea.

One last thing here.. can i get the output in below manner:

string : find all words before and after dot.like this; select lib.tab , lib1.tab1;

Output: dot.like lib.tab, lib1.tab1

Ksharp · Posted 09-19-2020 07:37 AM

Sure.

data want;
 string = " find all words before and after dot.like this; select lib.tab , lib1.tab1;";
 length want $ 80;
pid=prxparse('/\w+\.\w+/'); 
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
   want=catx(' ',want,substr(string,p,l));
    call prxnext(pid,s,e,string,p,l);
 end;
 drop s e p l pid;
run;

ballardw · Posted 09-17-2020 03:23 PM

Look into Proc SCAPROC.

It can direct job related information to a text file you specify that has the specific types of information you are looking for.

Some example output copied from the example in the documentation:

/* JOBSPLIT: ITEMSTOR INPUT SASUSER.TEMPLAT */
/* JOBSPLIT: ITEMSTOR INPUT SASHELP.TMPLMST */
/* JOBSPLIT: DATASET INPUT SEQ WORK.A.DATA */
/* JOBSPLIT: LIBNAME WORK ENGINE V9 PHYS C:\DOCUME~1\userid\LOCALS~1\Temp\SAS
Temporary Files\_TD1252 */
/* JOBSPLIT: ELAPSED 5187  */
/* JOBSPLIT: PROCNAME PRINT */
/* JOBSPLIT: STEP SOURCE FOLLOWS */
proc print data=a(obs=25);
run;


/* JOBSPLIT: DATASET INPUT SEQ WORK.A.DATA */
/* JOBSPLIT: LIBNAME WORK ENGINE V9 PHYS C:\DOCUME~1\userid\LOCALS~1\Temp\SAS
Temporary Files\_TD1252 */
/* JOBSPLIT: FILE OUTPUT C:\winnt\profiles\userid\record.txt */
/* JOBSPLIT: SYMBOL GET SYSSUMTRACE */
/* JOBSPLIT: ELAPSED 2750  */
/* JOBSPLIT: PROCNAME MEANS */
/* JOBSPLIT: STEP SOURCE FOLLOWS */
proc means data=a;
run;

The output before the code indicates input data sets such as

/* JOBSPLIT: DATASET INPUT SEQ WORK.A.DATA */

So you can parse the file and know that Work.A was the data set used for input to the Proc print AND file location of the library. Which is import for libraries whose names get reused with different locations.

I have data cleaning projects that are done monthly. The library changes to reflect the data. The code doesn't other than one setup programming establishing the paths and library. So even having a library name does not tell the actual location just from code.

upadhi · Posted 09-21-2020 03:56 AM

hello, The code is working as expected. However i have some exceptions. eg. reading _ and & as part of want.

example:

 string = "my aim is to find every word after bank_beg.upa_&kok for every bank_beg.&Kok in this line bank_beg.upa_&dd._kok(keep=x)";

want= upa_&kok %kok upa_&dd._kok

Basically the delimiters is limite to " ;)" rest all character should be read as part of want.

Ksharp · Posted 09-21-2020 06:30 AM

OK. Try this one .

pid=prxparse('/\.[^ ;()]+/');

Registration is open

SAS Training: Just a Click Away