Import zipped XML files

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Import zipped XML files

[ Edited ]

All,

 

I get multiple zipped folders that contain XML files. I want to "point" SAS into the zipped folder and parse out the XML files into records in a SAS dataset. 

 

Here's a diagram of what I'm dealing with:

data.zip
|__sub_folder |__ file1.xml |__ file2.xml
...
|__file500.xml

 

I know I can use a FILENAME with the ZIP option then read in the contents of a zipped folder:

filename inzip ZIP "C:\input_data\data.zip";

data folder_contents; length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; /*Scans and gives only the name of the xml file*/ memname=scan(dread(fid,i), 2, '\'); /*check for trailing / in folder name */ output; end; rc=dclose(fid); run;

title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;

And I know I can use the LIBNAME engine with the XML option (in this case XMLV2) with an XML map (made from the XML Mapper) to read in XML files into a dataset: 

filename SXLEMAP "C:\xml_map\my_map2.map";
libname my_xml_file XMLV2 "C:\XML_files\test.xml" xmlmap = SXLEMAP;
libname out "C:\output"; data out.xml_contents;
set my_xml_file.test; run;

How do I get a LIBNAME pointing "inside" a zipped folder, if I have to use the FILENAME engine to look inside said folder?

 

Right now, I believe we can only use the ZIP options in the FILENAME engine, not the LIBNAME. 

 

I want to avoid unzipping since it takes a while to unzip all of the XML files (and IT at my agency has strong restrictions on using X commands. Otherwise, I'd use X commands to move the XML files from the zipped folder to a staging folder, then use a macro to import the XML files). 


Accepted Solutions
Solution
‎01-05-2017 12:37 PM
Occasional Contributor
Posts: 6

Re: Import zipped XML files

Got it! 

 

Here is what I got: 

 

filename inzip ZIP "C:\input_data\data.zip";

data folder_contents;
 length memname $200 isFolder 8;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
	/*Scans and gives only the name of the xml file*/
  	memname=scan(dread(fid,i), 2, '\');
  	output;
 end;
 rc=dclose(fid);
run;

/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/file1.xml" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sub_folder\file1.xml) 
       	lrecl=256 
		recfm=F 
		length=length 
		eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;

filename SXLEMAP "C:\xml_map\my_map2.map';
libname my_xml XMLV2 "%sysfunc(getoption(work))/file1.xml" xmlmap=sxlemap access=readonly;
libname out "C:\Output"; data out.xml_contents; set my_xml.test; run;

 It's the intermediate data _null_ step I'm a little shaky on. So data step basically tells SAS to pluck the xml file from inside the zipped folder and stick it in a temporary work directory? This then allows me to set a LIBNAME on that file in the temporary work directory and leverage the XML engine to parse out the XML?

View solution in original post


All Replies
Community Manager
Posts: 2,692

Re: Import zipped XML files

You're almost there.  You just need to add a middle step in which you use DATA step to read the XML file out of the zip archive and then write it to a temp space in your session.  Then you can use LIBNAME XML2 to read the XML as data.

 

I have a similar example with an Excel file in this blog post.

Solution
‎01-05-2017 12:37 PM
Occasional Contributor
Posts: 6

Re: Import zipped XML files

Got it! 

 

Here is what I got: 

 

filename inzip ZIP "C:\input_data\data.zip";

data folder_contents;
 length memname $200 isFolder 8;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
	/*Scans and gives only the name of the xml file*/
  	memname=scan(dread(fid,i), 2, '\');
  	output;
 end;
 rc=dclose(fid);
run;

/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/file1.xml" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sub_folder\file1.xml) 
       	lrecl=256 
		recfm=F 
		length=length 
		eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;

filename SXLEMAP "C:\xml_map\my_map2.map';
libname my_xml XMLV2 "%sysfunc(getoption(work))/file1.xml" xmlmap=sxlemap access=readonly;
libname out "C:\Output"; data out.xml_contents; set my_xml.test; run;

 It's the intermediate data _null_ step I'm a little shaky on. So data step basically tells SAS to pluck the xml file from inside the zipped folder and stick it in a temporary work directory? This then allows me to set a LIBNAME on that file in the temporary work directory and leverage the XML engine to parse out the XML?

Community Manager
Posts: 2,692

Re: Import zipped XML files

@athomson - Exactly!  Good job!

Occasional Contributor
Posts: 6

Re: Import zipped XML files

Hi Chris (and anyone else reading this post)!

 

We've made some pretty good progress so far in trying to "macro-tize" the above so that it will read into multiple zipped folders and then parse out multiple XML files into records in a dataset. 

 

See the post below to offer any thoughts how we can overcome a couple of problems, and even make the macro run more efficiently!

 

https://communities.sas.com/t5/General-SAS-Programming/Improving-a-macro-that-parses-out-multiple-XM...

 

Thanks! 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 215 views
  • 3 likes
  • 2 in conversation