BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
athomson
Fluorite | Level 6

All,

 

I get multiple zipped folders that contain XML files. I want to "point" SAS into the zipped folder and parse out the XML files into records in a SAS dataset. 

 

Here's a diagram of what I'm dealing with:

data.zip
|__sub_folder |__ file1.xml |__ file2.xml
...
|__file500.xml

 

I know I can use a FILENAME with the ZIP option then read in the contents of a zipped folder:

filename inzip ZIP "C:\input_data\data.zip";

data folder_contents; length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; /*Scans and gives only the name of the xml file*/ memname=scan(dread(fid,i), 2, '\'); /*check for trailing / in folder name */ output; end; rc=dclose(fid); run;

title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;

And I know I can use the LIBNAME engine with the XML option (in this case XMLV2) with an XML map (made from the XML Mapper) to read in XML files into a dataset: 

filename SXLEMAP "C:\xml_map\my_map2.map";
libname my_xml_file XMLV2 "C:\XML_files\test.xml" xmlmap = SXLEMAP;
libname out "C:\output"; data out.xml_contents;
set my_xml_file.test; run;

How do I get a LIBNAME pointing "inside" a zipped folder, if I have to use the FILENAME engine to look inside said folder?

 

Right now, I believe we can only use the ZIP options in the FILENAME engine, not the LIBNAME. 

 

I want to avoid unzipping since it takes a while to unzip all of the XML files (and IT at my agency has strong restrictions on using X commands. Otherwise, I'd use X commands to move the XML files from the zipped folder to a staging folder, then use a macro to import the XML files). 

1 ACCEPTED SOLUTION

Accepted Solutions
athomson
Fluorite | Level 6

Got it! 

 

Here is what I got: 

 

filename inzip ZIP "C:\input_data\data.zip";

data folder_contents;
 length memname $200 isFolder 8;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
	/*Scans and gives only the name of the xml file*/
  	memname=scan(dread(fid,i), 2, '\');
  	output;
 end;
 rc=dclose(fid);
run;

/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/file1.xml" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sub_folder\file1.xml) 
       	lrecl=256 
		recfm=F 
		length=length 
		eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;

filename SXLEMAP "C:\xml_map\my_map2.map';
libname my_xml XMLV2 "%sysfunc(getoption(work))/file1.xml" xmlmap=sxlemap access=readonly;
libname out "C:\Output"; data out.xml_contents; set my_xml.test; run;

 It's the intermediate data _null_ step I'm a little shaky on. So data step basically tells SAS to pluck the xml file from inside the zipped folder and stick it in a temporary work directory? This then allows me to set a LIBNAME on that file in the temporary work directory and leverage the XML engine to parse out the XML?

View solution in original post

4 REPLIES 4
ChrisHemedinger
Community Manager

You're almost there.  You just need to add a middle step in which you use DATA step to read the XML file out of the zip archive and then write it to a temp space in your session.  Then you can use LIBNAME XML2 to read the XML as data.

 

I have a similar example with an Excel file in this blog post.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
athomson
Fluorite | Level 6

Got it! 

 

Here is what I got: 

 

filename inzip ZIP "C:\input_data\data.zip";

data folder_contents;
 length memname $200 isFolder 8;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
	/*Scans and gives only the name of the xml file*/
  	memname=scan(dread(fid,i), 2, '\');
  	output;
 end;
 rc=dclose(fid);
run;

/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/file1.xml" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sub_folder\file1.xml) 
       	lrecl=256 
		recfm=F 
		length=length 
		eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;

filename SXLEMAP "C:\xml_map\my_map2.map';
libname my_xml XMLV2 "%sysfunc(getoption(work))/file1.xml" xmlmap=sxlemap access=readonly;
libname out "C:\Output"; data out.xml_contents; set my_xml.test; run;

 It's the intermediate data _null_ step I'm a little shaky on. So data step basically tells SAS to pluck the xml file from inside the zipped folder and stick it in a temporary work directory? This then allows me to set a LIBNAME on that file in the temporary work directory and leverage the XML engine to parse out the XML?

ChrisHemedinger
Community Manager

@athomson - Exactly!  Good job!

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
athomson
Fluorite | Level 6

Hi Chris (and anyone else reading this post)!

 

We've made some pretty good progress so far in trying to "macro-tize" the above so that it will read into multiple zipped folders and then parse out multiple XML files into records in a dataset. 

 

See the post below to offer any thoughts how we can overcome a couple of problems, and even make the macro run more efficiently!

 

https://communities.sas.com/t5/General-SAS-Programming/Improving-a-macro-that-parses-out-multiple-XM...

 

Thanks! 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2659 views
  • 3 likes
  • 2 in conversation