BookmarkSubscribeRSS Feed
dhingra_sa
Calcite | Level 5

Hi All,

I understand there is an abundance of tutorials and guides for SAS, so I apologise for this question as I'm sure if I kept digging that I'd find a solution.

Unfortunately, I am a first-time user of SAS on a very tight timeline, and I would really appreciate some feedback on this request:

I have 75,000 .xml files located in a few folders and subfolders. Nothing is zipped.

Each file has been created/structured in the same way – a large number of columns (100+) and a lot of rows (500+). Each file only uses 1 sheet in Excel.

I need to use SAS to sort the data into something more understandable. Right now, it’s all chopped up into these 75,000 files so I can’t isolate, for example, one column to see what’s happening with it because that would mean opening 75,000 files independently.

I have no knowledge in SAS, so even importing the files appears challenging. I have SAS 9.4 installed.

After attempting to import even one file, SAS doesn’t seem to accept .xml files (through the File, Import method). I then changed the extension of a file to .xls and that also didn’t work (because I assume that I actually didn’t change the file type by simply changing the extension).

Do you have any ideas as to how I can import data from 75,000 files that are in .xml format without individually importing each one?

After reading, I realised this is useful for importing a single .xml file: SAS(R) 9.4 XML LIBNAME Engine: User's Guid

However, I obviously cannot do that for 75,000 files in the short time I have.

I would really appreciate some help in hopefully simple terms!

Cheers!

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

So they definately are XML files yes, not Excel.  Have you the code ready for the import of one file?  If so then you can just wrap that into a macro and call it once per file:

%macro import_xml (filename=,ds=);

...  /* Put your import code here, noting to replace the relevant bits with the parameters */

%mend import_xml;

filename tmp pipe 'dir "c:\dir_with_your_files" /b';  /* To get list of files */

data _null_;

     infile tmp;

     length buff $200;

     input buff $;

     call execute('%import_xml (filename='||strip(buff)||',ds=out_dataset'||strip(put(_n_,best.))||');');

run;

The above is just a quick example I typed in so not tested.  Basically get a listing from the OS, then create a macro call for each filename, and run the import.  You will then have a lot of files in your work directory which you can put together.   Sorry, its a bit quick as am just leaving.

dhingra_sa
Calcite | Level 5

Thanks for this!

Would the 75,000 files have to be in the same folder for this to work?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, that would be easiest.  Otherwise you will need to create a dataset, which contains all the paths/filenames of each subdirectory.  So from a fixed point, dir list all folders within there.  Then for each returned directory from that dir list all etc. to get everything.

dhingra_sa
Calcite | Level 5

Hi RW9,

The file name is 38(2013-02-28), and its located in D:\Users\z3466666\Desktop\all files

This is what I tried, but it didn't work:

%macro import_xml (filename=,ds=);

filename 38(2013-02-28) 'D:\Users\z3466666\Desktop\all_files';

filename map 'D:\Users\z3466666\Desktop\all_files\38(2013-02-28)generate.map';

libname 38(2013-02-28) xmlv2 automap=replace xmlmap=map; 3

proc print data=38(2013-02-28).team;

%mend import_xml;

filename tmp pipe 'D:\Users\z3466666\Desktop\all_files'; 

data _null_;

     infile tmp;

     length buff $200;

     input buff $;

     call execute('%import_xml (filename='||strip(buff)||',ds=out_dataset'||strip(put(_n_,best.))||');');

run;

The error is:

1    %macro import_xml (filename=,ds=);

2    filename 38(2013-02-28) 'D:\Users\z3466666\Desktop\all files';

3    filename map 'D:\Users\z3466666\Desktop\all files\38(2013-02-28)generate.map';

4

5    libname 38(2013-02-28) xmlv2 automap=replace xmlmap=map; 3

6

7    proc print data=nhl.team;

8    %mend import_xml;

9

10   filename tmp pipe 'D:\Users\z3466666\Desktop\all files';

11

12   data _null_;

13        infile tmp;

14        length buff $200;

15        input buff $;

16        call execute('%import_xml

16 ! (filename='||strip(buff)||',ds=out_dataset'||strip(put(_n_,best.))||');');

17   run;

NOTE: The infile TMP is:

      Unnamed Pipe Access Device,

      PROCESS=D:\Users\z3466666\Desktop\all files,

      RECFM=V,LRECL=32767

Stderr output:

'D:\Users\z3466666\Desktop\all' is not recognized as an internal or external command,

operable program or batch file.

NOTE: 0 records were read from the infile TMP.

NOTE: DATA statement used (Total process time):

      real time           0.44 seconds

      cpu time            0.11 seconds

Any further suggestions?

Appreciate your help.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well several things.

First, naming your files 38(2013-02-28).xml is not recommended.  In fact, don't use that as a reference anywhere in your code other than reffering to the filename.

Your error comes from the fact you are passing the string "D:\Users\z3466666\Desktop\all_files" to your OS, this doesn't do anything.  You need to understand how your operating systems works and what commands do what.  To get a directory listing you would type:

dir <pathname>

So we need to create that string to be passed to the OS.

filename tmp pipe 'dir "c:\abc" /b -d';

This will return a list of filenames in that directory.

Your code should look more like this:

%macro import_xml (filename=,ds=);

  filename inxml "D:\Users\z3466666\Desktop\all_files\&filename..xml"; 

  filename map 'D:\Users\z3466666\Desktop\all_files\38(2013-02-28)generate.map';

  libname mylibxml xmlv2 automap=replace xmlmap=map;

  proc print data=mylibxml.team;

  filename inxml clear;

  filename map clear;

  libname mylibxml clear;

%mend import_xml;

filename tmp pipe 'dir "D:\Users\z3466666\Desktop\all_files"';

data _null_;

     infile tmp;

     length buff $200;

     input buff $;

     call execute('%import_xml (filename='||strip(buff)||',ds=out_dataset'||strip(put(_n_,best.))||');');

run;

soniamonga
Calcite | Level 5

Hi there, 

 

I tried the code as below:

 

%macro import_xml (filename=,ds=);
filename inxml "C:\Users\ASHU\Desktop\Test\&filename..xml";
filename map 'C:\Users\ASHU\Desktop\Test\ashugenerate.map';
libname mylibxml xmlv2 automap=replace xmlmap=map;
proc print data=mylibxml.team;
filename inxml clear;
filename map clear;
libname mylibxml clear;
%mend import_xml;
filename tmp pipe 'dir "C:\Users\ASHU\Desktop\Test"';
data _null_;
infile tmp;
length buff $200;
input buff $;
call execute('%import_xml (filename='||strip(buff)||',ds=out_dataset'||strip(put(_n_,best.))||');');
run;

 

however I am getting the error as below:

 

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

 

27 +
proc print data=mylibxml.team; filename inxml clear; filename map clear; libname
ERROR: Libname MYLIBXML is not assigned.
NOTE: Fileref INXML has been deassigned.
NOTE: Fileref MAP has been deassigned.
28 + mylibxml clear;
WARNING: Libname MYLIBXML is not assigned.
28 + ;
29 + filename inxml "C:\Users\ASHU\Desktop\Test\2.xml"; filename map
'C:\Users\ASHU\Desktop\Test\ashugenerate.map'; libname mylibxml xmlv2 automap=replace xmlmap=map;
ERROR: The XMLV2 engine cannot be found.
ERROR: Error in the LIBNAME statement.

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

29 +
proc print data=mylibxml.team; filename inxml clear; filename map clear; libname
ERROR: Libname MYLIBXML is not assigned.
NOTE: Fileref INXML has been deassigned.
NOTE: Fileref MAP has been deassigned.
30 + mylibxml clear;
WARNING: Libname MYLIBXML is not assigned.

 

I understand that the Libname need to be assigned however I tried assigning a Libname but it's still giving me an error. Can you please help and let me know as in what's going on.

 

Thank you so much for your help.

 

Sonia

soniamonga
Calcite | Level 5

Can anyone please help 😞

 

Sonia !!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2775 views
  • 9 likes
  • 3 in conversation