@ismahero1 wrote:
Hi, I have a txt file that contains millions of records from an XML file. I am trying to load them to SAS but there are some random records that are longer than 32767. When I load the code, I successfully load all the records but I have problem with those long records present in the file. The txt file in itself is very simple to read and create a code to load, but is the presence of those records that are longer than 32767 that are making me have problems when loading the data. Anyone know how to handle records that are longer than 32767?
Show more details on how you are reading the file. SAS can handle RECORD lengths much larger than 32K. You just have to tell it that.
infile in lrecl=1000000000 ...;
Or are you actually getting stuck by individual values that are longer then 32K? SAS variables cannot be longer than 32K.
You cannot use _INFILE_ if the record length is larger than the size of a variable.
Try something like this instead.
DATA TOC1 ;
INFILE RECIN (THIS IS LOOKING AT THE FILE I AM TRYING TO LOAD) LRECL = 1000000 column=current_col length=line_size;
LENGTH UNQ_KEY $18 RETVER $20 num_docs 8 test_length 8 next_word $11;
INPUT @ ' ' +17 UNQ_KEY $CHAR18.
@ 'Version=' +15 RETVER $CHAR20.
@
;
IF FIND(RETVER, '"') > 0 THEN RETVER=SUBSTR(RETVER,1,(FIND(RETVER,'"')-1));
num_doc=0;
do while(line_size > current_col);
input next_word @;
if next_word = 'documentId=' then num_doc=num_doc+1; ;
end;
TEST_LENGTH= line_size;
RUN;
_INFILE_ is a variable. So is limited by the same rules as any other variable.
When posting code or log entries copied from your editor or SAS log it works best on the forum to open a code box using either the {I} or "running man" icon at the top of the message window. That will prevent the forum software from reformatting text. So you get something like:
DATA TOC1 ; INFILE RECIN LRECL = 1000000; LENGTH UNQ_KEY $18. RETVER $20.; INPUT @(INDEX(_INFILE_,'')+17) UNQ_KEY $CHAR18. @(INDEX(_INFILE_,'Version=')+15) RETVER $CHAR20. ; IF FIND(RETVER, '"') > 0 THEN RETVER=SUBSTR(RETVER,1,(FIND(RETVER,'"')-1)); NUM_DOCS=COUNT(_INFILE_,' documentId=') ; TEST_LENGTH= LENGTH(_INFILE_); RUN;
(If that is close to what your code looked like in the editor).
The two code boxes do things slightly differently. The {} is pretty much plain text while the "running man" will attempt to use some syntax highlighting rules.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.