Solved: Re: Parse prxmatch and remove unwanted text

Jeff_DOC · Posted 12-21-2022 01:03 PM

Good morning.

I'm using the attached code to document all tables in my code (thanks to others as I'd have never figured this out on my own). However, I'd like the result to be a bit cleaner.

For instance, you can see that a number of entries contains the leading words SEQ and MULTI as well as the trailing words.DATA and .VIEW. I'd like to remove those before the Excel file creation. I've never used this type of code so I'd appreciate any help someone could give me.

proc scaproc;
write;
run;

filename sca_rec "&documentation.\&documentation_name..txt";
filename sca_rec '<<SCAPROC TEXT FILE';

data myinfo;

infile sca_rec truncover;
input @1 scaline $256.;
length type myinfo $100;
keep type myinfo;
retain prxifile prxidata;

if _n_ = 1 then do;
	prxifile = prxparse("!\bJOBSPLIT: FILE INPUT SEQ (\b.*\b) \*/!");
	prxidata = prxparse("!\bJOBSPLIT: DATASET INPUT (\b.*\b) \*/!");
end;

	if prxmatch (prxidata,scaline) > 0 then do;
		myinfo = prxposn(prxidata,1,scaline);
		type = 'Input File';
		output myinfo;
	end;
run;

proc export data = myinfo
	outfile = '<<outfile path>>'
/*	putnames = no;*/
	dbms = xls replace;
run;

SASJedi · Posted 12-21-2022 03:52 PM

One approach woud be to re-write the regular expression to capture just the text you want, but that can be complex if you are not familiar with REGEX writing. Another approach would be to process the result using SAS functions. So, in your program, the result of the expression:

myinfo = prxposn(prxidata,1,scaline);

Yields:

MULTI VIEWSDT.FILE1.VIEW

To get rid of all the text before that first space, you could add SCAN:

myinfo = scan(prxposn(prxidata,1,scaline),-1,' ');

Which now yields:

VIEWSDT.FILE1.VIEW

If you want to also remove that last "word" after the period, you re-process the result using SUBSTR and FIND like this:

myinfo = SUBSTR(myinfo,1,FIND(myinfo,'.',-9999)-1);

Which now yields:

VIEWSDT.FILE1

Just be aware that, when the value extracted is something like "SEQ LIST.DATA", the end result will be just "LIST". Is that what you were envisioning?

Check out my Jedi SAS Tricks for SAS Users

View solution in original post

SASJedi · Posted 12-21-2022 03:52 PM

One approach woud be to re-write the regular expression to capture just the text you want, but that can be complex if you are not familiar with REGEX writing. Another approach would be to process the result using SAS functions. So, in your program, the result of the expression:

myinfo = prxposn(prxidata,1,scaline);

Yields:

MULTI VIEWSDT.FILE1.VIEW

To get rid of all the text before that first space, you could add SCAN:

myinfo = scan(prxposn(prxidata,1,scaline),-1,' ');

Which now yields:

VIEWSDT.FILE1.VIEW

If you want to also remove that last "word" after the period, you re-process the result using SUBSTR and FIND like this:

myinfo = SUBSTR(myinfo,1,FIND(myinfo,'.',-9999)-1);

Which now yields:

VIEWSDT.FILE1

Just be aware that, when the value extracted is something like "SEQ LIST.DATA", the end result will be just "LIST". Is that what you were envisioning?

Check out my Jedi SAS Tricks for SAS Users

Jeff_DOC · Posted 12-21-2022 03:59 PM

I think so. At least it gets me way close. Thanks very much for your help.

Jeff_DOC · Posted 12-21-2022 04:01 PM

Well there's a newbie error. I accepted my own post as a solution, sigh. Sorry, that should have been awarded to you.

SASJedi · Posted 12-22-2022 10:13 AM

LOL! No problem - that's fixable, for sure. Glad you're back on track 😊

Check out my Jedi SAS Tricks for SAS Users

Parse prxmatch and remove unwanted text

Re: Parse prxmatch and remove unwanted text

Re: Parse prxmatch and remove unwanted text

Re: Parse prxmatch and remove unwanted text

Re: Parse prxmatch and remove unwanted text

Re: Parse prxmatch and remove unwanted text

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!