Hi,
im trying to read a large non delimited text file into a dataset. Im using PC SAS 9.4.
The file is the document.xml part from a docx file. It does not seem to be possible to read it into one variable and one row in a dataset since it is too large (>300kb). I was thinking that one way of doing this is to pre process the file by reading it character by character and add a CR (carriage return) every time i see a '>'. Then output it and re-read it by using proc import with CR ('0D0A'x) as a delimiter.
Can this be done. If yes then how?
BR
Jan
p.s. note that reading the file with a XML libname is not useful here.
Not sure if it will help but here is how to do what you asked.
filename in 'document.xml';
filename out 'document_fixed.xml';
data _null_;
infile in recfm=n;
file out recfm=n;
input char $char1. ;
put char $char1. ;
if char='>' then put '0D'x;
run;
I don't think that UTF-8 (or other multibyte character sets) would make any difference.
Not sure if it will help but here is how to do what you asked.
filename in 'document.xml';
filename out 'document_fixed.xml';
data _null_;
infile in recfm=n;
file out recfm=n;
input char $char1. ;
put char $char1. ;
if char='>' then put '0D'x;
run;
I don't think that UTF-8 (or other multibyte character sets) would make any difference.
Sorry i was too fast. I of course meant to write Tom
Thanks Tom!!!
😉
BR
J
@Tom wrote:
Not sure if it will help but here is how to do what you asked.
filename in 'document.xml'; filename out 'document_fixed.xml'; data _null_; infile in recfm=n; file out recfm=n; input char $char1. ; put char $char1. ; if char='>' then put '0D'x; run;
I don't think that UTF-8 (or other multibyte character sets) would make any difference.
I wonder if it would be faster to read the file as if were fixed length and apply TRANSLATE function to _INFILE_.
Hi Reeza,
that certainly did the trick. Thanks a lot!
Yes. I was considering Python as an option but prefer to keep it all in SAS.
BR
Jan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.