- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
im trying to read a large non delimited text file into a dataset. Im using PC SAS 9.4.
The file is the document.xml part from a docx file. It does not seem to be possible to read it into one variable and one row in a dataset since it is too large (>300kb). I was thinking that one way of doing this is to pre process the file by reading it character by character and add a CR (carriage return) every time i see a '>'. Then output it and re-read it by using proc import with CR ('0D0A'x) as a delimiter.
Can this be done. If yes then how?
BR
Jan
p.s. note that reading the file with a XML libname is not useful here.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if it will help but here is how to do what you asked.
filename in 'document.xml';
filename out 'document_fixed.xml';
data _null_;
infile in recfm=n;
file out recfm=n;
input char $char1. ;
put char $char1. ;
if char='>' then put '0D'x;
run;
I don't think that UTF-8 (or other multibyte character sets) would make any difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data test;
infile 'path to xml' lrecl=32000;
input;
length x $32000.;
x=_infile_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure if it will help but here is how to do what you asked.
filename in 'document.xml';
filename out 'document_fixed.xml';
data _null_;
infile in recfm=n;
file out recfm=n;
input char $char1. ;
put char $char1. ;
if char='>' then put '0D'x;
run;
I don't think that UTF-8 (or other multibyte character sets) would make any difference.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry i was too fast. I of course meant to write Tom
Thanks Tom!!!
😉
BR
J
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Tom wrote:
Not sure if it will help but here is how to do what you asked.
filename in 'document.xml'; filename out 'document_fixed.xml'; data _null_; infile in recfm=n; file out recfm=n; input char $char1. ; put char $char1. ; if char='>' then put '0D'x; run;
I don't think that UTF-8 (or other multibyte character sets) would make any difference.
I wonder if it would be faster to read the file as if were fixed length and apply TRANSLATE function to _INFILE_.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza,
that certainly did the trick. Thanks a lot!
Yes. I was considering Python as an option but prefer to keep it all in SAS.
BR
Jan