BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ea33
Fluorite | Level 6

I need to parse an XML that is stored in a database column (the whole XML in 1 column). To get around the 32K string limit, I've split up the xml to multiple parts in a query and then tried to place them in one file for processing. Code below:

 

CREATE TABLE work.table AS
SELECT * FROM connection to src(
SELECT date, market, 
	substring(content, 1, 30000) as xml_1,
	substring(content, 30001, 30000) as xml_2,
	substring(content, 60001, 30000) as xml_3,
	substring(content, 90001, 30000) as xml_4,
	substring(content, 120001, 30000) as xml_5,
	substring(content, 150001, 30000) as xml_6,
	substring(content, 180001, 30000) as xml_7,
	substring(content, 210001, 30000) as xml_8,
	substring(content, 240001, 30000) as xml_9
FROM table); disconnect from src; Quit;

filename xml_file 'path/xml.xml';

data _null_;
set table_from_query;
output;
file xml_file;
put xml_1 xml_2 xml_3 xml_4 ... xml_9;
run;

It all works as I expect it, but when I look at the xml created "xml_file" In the places that they were joined (xml_i ends and xml_i+1 starts) they are not placed on the same line. Some breaks do not cause errors, but some do as they split the actual tag. Example below:

 

<Interval NUM=""33"">
<IntervalEnding>08:15</IntervalEndin          <----- Split causing error >
g>
<IntervalValue>63.25</IntervalValue>
</Interval>
<Interval NUM=""34"">
<IntervalEnding>08:30</IntervalEnding>
<IntervalValue>50.84</IntervalValue>
</Interval>

 

How can I start the new input on the same line as before? I've tried @ after put, but it is the same result.

 

Any help is appreciated. Side note: put xml_: doesn't work instead of naming all xml_1 through xml_9.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Two issues.

First the default logical record length (LRECL) is 32,767 bytes.  You are trying to write 9*30,000 bytes.

Second if you use list mode output then the PUT statement will insert a space between the values.  

 

So set a longer record length and use formatted output.

Simple version. Not sure why you had the FILENAME statement but I removed it make example shorter.  Use the $CHAR format to preserve any leading spaces on any of the sub strings.

data _null_;
  set table_from_query;
  file 'path/xml.xml' lrecl=1000000;
  put (xml_1-xml_9) ($char30000.) ;
run;

SAS can normally write lines at least as long a 1,000,000 bytes.  The actual limit probably varies by version and operating system and possibly even computer memory available.  If you need longer use RECFM=N instead.  That should also remove the line breaks between observations.  Which you might or might not want to do.

 

If you are actually including the line breaks between observations then you can probably remove the trailing spaces using $VARYING format, but you would need to figure out where the string actually ends.  For example by find the last non-empty string.

data _null_;
  set table_from_query;
  file 'path/xml.xml' lrecl=1000000;
  array xml_ [9] ;
  do last=dim(xml_) to 1 by -1 while(missing(xml_[last])); end;
  if last then do;
    do index=1 to last-1 ;
      put xml_[index] $char30000. @;
    end;
    length=lengthn(xml_[last]);
    put xml_[last] $varying30000. length @;
  end;
  put;
run;

 

 

 

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

Two issues.

First the default logical record length (LRECL) is 32,767 bytes.  You are trying to write 9*30,000 bytes.

Second if you use list mode output then the PUT statement will insert a space between the values.  

 

So set a longer record length and use formatted output.

Simple version. Not sure why you had the FILENAME statement but I removed it make example shorter.  Use the $CHAR format to preserve any leading spaces on any of the sub strings.

data _null_;
  set table_from_query;
  file 'path/xml.xml' lrecl=1000000;
  put (xml_1-xml_9) ($char30000.) ;
run;

SAS can normally write lines at least as long a 1,000,000 bytes.  The actual limit probably varies by version and operating system and possibly even computer memory available.  If you need longer use RECFM=N instead.  That should also remove the line breaks between observations.  Which you might or might not want to do.

 

If you are actually including the line breaks between observations then you can probably remove the trailing spaces using $VARYING format, but you would need to figure out where the string actually ends.  For example by find the last non-empty string.

data _null_;
  set table_from_query;
  file 'path/xml.xml' lrecl=1000000;
  array xml_ [9] ;
  do last=dim(xml_) to 1 by -1 while(missing(xml_[last])); end;
  if last then do;
    do index=1 to last-1 ;
      put xml_[index] $char30000. @;
    end;
    length=lengthn(xml_[last]);
    put xml_[last] $varying30000. length @;
  end;
  put;
run;

 

 

 

ea33
Fluorite | Level 6
Thank you, Tom. This did exactly what I wanted it to do.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1227 views
  • 1 like
  • 2 in conversation