BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Lars_Beck
Calcite | Level 5

Hi,

 

I am trying to read an xml file into a SAS dataset but I am struggeling to get all the xml code to fit into one single cell. Instead I am getting one row for every "line" in the xml document.

I've been using the infile statement. Perhaps there is an option I am missing?

 

data test;
infile "\\myfile.xml";
format line $2000.;
input;
line = _infile_;
run;

 

Cheers

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Doesn't make sense to me why you want the entire contents of a file is one observation and a single variable (SAS data sets do not have "cells").

You can try using a different TERMSTR setting on the Infile statement than is valid for your operating system. That option is what tells SAS you have reached the "end of a line" in a text file.

If your operating system is Windows then the default characters for end of line are carriage-return and line-feed, if a Unix derivative the end of line is a line-feed.

So you might try Termstr=CR which is unlikely to be the actual end of line character unless this started on an Apple computer.

data test;
infile "\\myfile.xml"  Termstr=CR ;
format line $2000.;
input;
line = _infile_;
run;

How sure are you that "all of the file" will fit in 2000 characters?

You may have increase the LRECL on the infile

 

What will you do with that resulting, pretty sure to be moderately ugly, variable?

View solution in original post

4 REPLIES 4
ballardw
Super User

Doesn't make sense to me why you want the entire contents of a file is one observation and a single variable (SAS data sets do not have "cells").

You can try using a different TERMSTR setting on the Infile statement than is valid for your operating system. That option is what tells SAS you have reached the "end of a line" in a text file.

If your operating system is Windows then the default characters for end of line are carriage-return and line-feed, if a Unix derivative the end of line is a line-feed.

So you might try Termstr=CR which is unlikely to be the actual end of line character unless this started on an Apple computer.

data test;
infile "\\myfile.xml"  Termstr=CR ;
format line $2000.;
input;
line = _infile_;
run;

How sure are you that "all of the file" will fit in 2000 characters?

You may have increase the LRECL on the infile

 

What will you do with that resulting, pretty sure to be moderately ugly, variable?

Lars_Beck
Calcite | Level 5

Thank you. It seems to work.

 

I might extend the length of the variable if I need to.

 

The purpose is to read several files in this way and use regex to find the relevant information. That way I will get one row for each xml.

ballardw
Super User

I might be tempted to start with the XML library

 

Libname xmlin xml '\\myfile.xml';

 

and then take a look at that library. If the file is clean enough you should find a data set of some sort.

Then

data work.somedata;

    set xmlin.somedata;

run;

would copy that data set from the XML to the work library (or easily extend to a different library).

Might be a lot easier than dealing with lots of REGEX code to parse.

 

Tom
Super User Tom
Super User

Read the file as BINARY instead of LINES of TEXT.

So use RECFM=F or RECFM=N.

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1262 views
  • 0 likes
  • 3 in conversation