- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good morning.
I'm using the attached code to document all tables in my code (thanks to others as I'd have never figured this out on my own). However, I'd like the result to be a bit cleaner.
For instance, you can see that a number of entries contains the leading words SEQ and MULTI as well as the trailing words.DATA and .VIEW. I'd like to remove those before the Excel file creation. I've never used this type of code so I'd appreciate any help someone could give me.
proc scaproc;
write;
run;
filename sca_rec "&documentation.\&documentation_name..txt";
filename sca_rec '<<SCAPROC TEXT FILE';
data myinfo;
infile sca_rec truncover;
input @1 scaline $256.;
length type myinfo $100;
keep type myinfo;
retain prxifile prxidata;
if _n_ = 1 then do;
prxifile = prxparse("!\bJOBSPLIT: FILE INPUT SEQ (\b.*\b) \*/!");
prxidata = prxparse("!\bJOBSPLIT: DATASET INPUT (\b.*\b) \*/!");
end;
if prxmatch (prxidata,scaline) > 0 then do;
myinfo = prxposn(prxidata,1,scaline);
type = 'Input File';
output myinfo;
end;
run;
proc export data = myinfo
outfile = '<<outfile path>>'
/* putnames = no;*/
dbms = xls replace;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One approach woud be to re-write the regular expression to capture just the text you want, but that can be complex if you are not familiar with REGEX writing. Another approach would be to process the result using SAS functions. So, in your program, the result of the expression:
myinfo = prxposn(prxidata,1,scaline);
Yields:
MULTI VIEWSDT.FILE1.VIEW
To get rid of all the text before that first space, you could add SCAN:
myinfo = scan(prxposn(prxidata,1,scaline),-1,' ');
Which now yields:
VIEWSDT.FILE1.VIEW
If you want to also remove that last "word" after the period, you re-process the result using SUBSTR and FIND like this:
myinfo = SUBSTR(myinfo,1,FIND(myinfo,'.',-9999)-1);
Which now yields:
VIEWSDT.FILE1
Just be aware that, when the value extracted is something like "SEQ LIST.DATA", the end result will be just "LIST". Is that what you were envisioning?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One approach woud be to re-write the regular expression to capture just the text you want, but that can be complex if you are not familiar with REGEX writing. Another approach would be to process the result using SAS functions. So, in your program, the result of the expression:
myinfo = prxposn(prxidata,1,scaline);
Yields:
MULTI VIEWSDT.FILE1.VIEW
To get rid of all the text before that first space, you could add SCAN:
myinfo = scan(prxposn(prxidata,1,scaline),-1,' ');
Which now yields:
VIEWSDT.FILE1.VIEW
If you want to also remove that last "word" after the period, you re-process the result using SUBSTR and FIND like this:
myinfo = SUBSTR(myinfo,1,FIND(myinfo,'.',-9999)-1);
Which now yields:
VIEWSDT.FILE1
Just be aware that, when the value extracted is something like "SEQ LIST.DATA", the end result will be just "LIST". Is that what you were envisioning?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think so. At least it gets me way close. Thanks very much for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well there's a newbie error. I accepted my own post as a solution, sigh. Sorry, that should have been awarded to you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
LOL! No problem - that's fixable, for sure. Glad you're back on track 😊