All,
I get multiple zipped folders that contain XML files. I want to "point" SAS into the zipped folder and parse out the XML files into records in a SAS dataset.
Here's a diagram of what I'm dealing with:
data.zip
|__sub_folder |__ file1.xml |__ file2.xml
...
|__file500.xml
I know I can use a FILENAME with the ZIP option then read in the contents of a zipped folder:
filename inzip ZIP "C:\input_data\data.zip";
data folder_contents;
length memname $200 isFolder 8;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
/*Scans and gives only the name of the xml file*/
memname=scan(dread(fid,i), 2, '\');
/*check for trailing / in folder name */
output;
end;
rc=dclose(fid);
run;
title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;
And I know I can use the LIBNAME engine with the XML option (in this case XMLV2) with an XML map (made from the XML Mapper) to read in XML files into a dataset:
filename SXLEMAP "C:\xml_map\my_map2.map";
libname my_xml_file XMLV2 "C:\XML_files\test.xml" xmlmap = SXLEMAP;
libname out "C:\output";
data out.xml_contents;
set my_xml_file.test;
run;
How do I get a LIBNAME pointing "inside" a zipped folder, if I have to use the FILENAME engine to look inside said folder?
Right now, I believe we can only use the ZIP options in the FILENAME engine, not the LIBNAME.
I want to avoid unzipping since it takes a while to unzip all of the XML files (and IT at my agency has strong restrictions on using X commands. Otherwise, I'd use X commands to move the XML files from the zipped folder to a staging folder, then use a macro to import the XML files).
Got it!
Here is what I got:
filename inzip ZIP "C:\input_data\data.zip";
data folder_contents;
length memname $200 isFolder 8;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
/*Scans and gives only the name of the xml file*/
memname=scan(dread(fid,i), 2, '\');
output;
end;
rc=dclose(fid);
run;
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;
/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/file1.xml" ;
/* hat tip: "data _null_" on SAS-L */
data _null_;
/* using member syntax here */
infile inzip(sub_folder\file1.xml)
lrecl=256
recfm=F
length=length
eof=eof unbuf;
file xl lrecl=256 recfm=N;
input;
put _infile_ $varying256. length;
return;
eof:
stop;
run;
filename SXLEMAP "C:\xml_map\my_map2.map';
libname my_xml XMLV2 "%sysfunc(getoption(work))/file1.xml" xmlmap=sxlemap access=readonly;
libname out "C:\Output";
data out.xml_contents;
set my_xml.test;
run;
It's the intermediate data _null_ step I'm a little shaky on. So data step basically tells SAS to pluck the xml file from inside the zipped folder and stick it in a temporary work directory? This then allows me to set a LIBNAME on that file in the temporary work directory and leverage the XML engine to parse out the XML?
You're almost there. You just need to add a middle step in which you use DATA step to read the XML file out of the zip archive and then write it to a temp space in your session. Then you can use LIBNAME XML2 to read the XML as data.
I have a similar example with an Excel file in this blog post.
Got it!
Here is what I got:
filename inzip ZIP "C:\input_data\data.zip";
data folder_contents;
length memname $200 isFolder 8;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
/*Scans and gives only the name of the xml file*/
memname=scan(dread(fid,i), 2, '\');
output;
end;
rc=dclose(fid);
run;
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=folder_contents noobs N;
run;
/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/file1.xml" ;
/* hat tip: "data _null_" on SAS-L */
data _null_;
/* using member syntax here */
infile inzip(sub_folder\file1.xml)
lrecl=256
recfm=F
length=length
eof=eof unbuf;
file xl lrecl=256 recfm=N;
input;
put _infile_ $varying256. length;
return;
eof:
stop;
run;
filename SXLEMAP "C:\xml_map\my_map2.map';
libname my_xml XMLV2 "%sysfunc(getoption(work))/file1.xml" xmlmap=sxlemap access=readonly;
libname out "C:\Output";
data out.xml_contents;
set my_xml.test;
run;
It's the intermediate data _null_ step I'm a little shaky on. So data step basically tells SAS to pluck the xml file from inside the zipped folder and stick it in a temporary work directory? This then allows me to set a LIBNAME on that file in the temporary work directory and leverage the XML engine to parse out the XML?
@athomson - Exactly! Good job!
Hi Chris (and anyone else reading this post)!
We've made some pretty good progress so far in trying to "macro-tize" the above so that it will read into multiple zipped folders and then parse out multiple XML files into records in a dataset.
See the post below to offer any thoughts how we can overcome a couple of problems, and even make the macro run more efficiently!
Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.