BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
upadhi
Quartz | Level 8

string: my aim is to find every word after BANK.upa for every bank.xx in this line bank.ff

output: upa xx ff

 

How can i achieve the same

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data want;
 string = "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff";
 length want $ 80;
pid=prxparse('/(?<=bank_beg\.)\w+/i'); 
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
   want=catx(' ',want,substr(string,p,l));
    call prxnext(pid,s,e,string,p,l);
 end;
 drop s e p l pid;
run;

View solution in original post

11 REPLIES 11
yabwon
Onyx | Level 15

Mayby with: findw, substr + scan?

data _null_;
  start = 1;
  word = "bank";
  string = "my aim is to find every word after BANK.upa for every bank.xx in this line bank.ff";
  length tmp $ 20 output $ 200;

  do until (pos = 0);
    pos = findw(string, word, ,"SPI", start);
    if pos = 0 then leave;
    tmp = scan(substr(string, pos), 2, ,"SPI");
    start = pos + 1;
    output = catx(" ", output, tmp);
  end;

  put _all_;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



upadhi
Quartz | Level 8

hello Bart,

The code works as expected. However, next challenge i faced was, the word i was searching for has an "_"

Example: word="bank_beg"

string: "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff"

 

The output i get is beg

yabwon
Onyx | Level 15
data _null_;
  start = 1;
  word = "bank_beg";
  string = "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff";
  length tmp $ 20 output $ 200;

  do until (pos = 0);
    pos = findw(string, word, ".","SI", start);
    if pos = 0 then leave;
    tmp = scan(substr(string, pos), 2, ".","SI");
    start = pos + 1;
    output = catx(" ", output, tmp);
  end;

  put _all_;
run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Ksharp
Super User
data want;
 string = "my aim is to find every word after bank_beg.upa for every bank_beg.xx in this line bank_beg.ff";
 length want $ 80;
pid=prxparse('/(?<=bank_beg\.)\w+/i'); 
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
   want=catx(' ',want,substr(string,p,l));
    call prxnext(pid,s,e,string,p,l);
 end;
 drop s e p l pid;
run;
upadhi
Quartz | Level 8

Thank you so much for your help. apologies if i am asking too much. but i also need to find out if table we detected was input/output. for example:

upadhi_0-1600347305518.png

here for row 1, UPA11 is output table. for row-7, FSK_HEADER is an input table. , row-8, FSK_Scenario is input table and so on

 

is there a possible solution to find this out

Ksharp
Super User
That is very complicated . You need define many rules .
i.e. out=xxx is output table ,change prx as
pid=prxparse('/(?<=out\=bank_beg\.)\w+/i');

set xxx; is input table, change prx as
pid=prxparse('/(?<=\bset bank_beg\.)\w+/i');
upadhi
Quartz | Level 8

hey. yes we thought so, hence, for now dropping that idea.

 

One last thing here.. can i get the output in below manner:

 

string : find all words before and after dot.like this; select lib.tab , lib1.tab1;

 

Output: dot.like lib.tab, lib1.tab1

Ksharp
Super User

Sure.

data want;
 string = " find all words before and after dot.like this; select lib.tab , lib1.tab1;";
 length want $ 80;
pid=prxparse('/\w+\.\w+/'); 
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
   want=catx(' ',want,substr(string,p,l));
    call prxnext(pid,s,e,string,p,l);
 end;
 drop s e p l pid;
run;
ballardw
Super User

Look into Proc SCAPROC.

It can direct job related information to a text file you specify that has the specific types of information you are looking for.

Some example output copied from the example in the documentation:

/* JOBSPLIT: ITEMSTOR INPUT SASUSER.TEMPLAT */
/* JOBSPLIT: ITEMSTOR INPUT SASHELP.TMPLMST */
/* JOBSPLIT: DATASET INPUT SEQ WORK.A.DATA */
/* JOBSPLIT: LIBNAME WORK ENGINE V9 PHYS C:\DOCUME~1\userid\LOCALS~1\Temp\SAS
Temporary Files\_TD1252 */
/* JOBSPLIT: ELAPSED 5187  */
/* JOBSPLIT: PROCNAME PRINT */
/* JOBSPLIT: STEP SOURCE FOLLOWS */
proc print data=a(obs=25);
run;


/* JOBSPLIT: DATASET INPUT SEQ WORK.A.DATA */
/* JOBSPLIT: LIBNAME WORK ENGINE V9 PHYS C:\DOCUME~1\userid\LOCALS~1\Temp\SAS
Temporary Files\_TD1252 */
/* JOBSPLIT: FILE OUTPUT C:\winnt\profiles\userid\record.txt */
/* JOBSPLIT: SYMBOL GET SYSSUMTRACE */
/* JOBSPLIT: ELAPSED 2750  */
/* JOBSPLIT: PROCNAME MEANS */
/* JOBSPLIT: STEP SOURCE FOLLOWS */
proc means data=a;
run;

The output before the code indicates input data sets such as

/* JOBSPLIT: DATASET INPUT SEQ WORK.A.DATA */

So you can parse the file and know that Work.A was the data set used for input to the Proc print AND file location of the library. Which is import for libraries whose names get reused with different locations.

 

I have data cleaning projects that are done monthly. The library changes to reflect the data. The code doesn't other than one setup programming establishing the paths and library. So even having a library name does not tell the actual location just from code.

upadhi
Quartz | Level 8

hello, The code is working as expected. However i have some exceptions. eg. reading _ and & as part of want.

example:

 string = "my aim is to find every word after bank_beg.upa_&kok for every bank_beg.&Kok in this line bank_beg.upa_&dd._kok(keep=x)";

 

want= upa_&kok %kok upa_&dd._kok

 

Basically the delimiters is limite to " ;)" rest all character should be read as part of want.

Ksharp
Super User
OK. Try this one .

pid=prxparse('/\.[^ ;()]+/'); 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2164 views
  • 4 likes
  • 4 in conversation