Well, there is always room for improvement I think.
You could reduce the part of gathering the xml names into a single data step, avoiding all those temporary datasets.
Something like this for example:
* get zips and xmls into macro vars
&zipNobs = total zip files in dir &&xmlNobs_&I = total xml files in zip(&I) &&zipName_&I = zip(&I)) name &&xmlName_&I._&J = xml(&J) name in zip(&I);
data _null_;
* open dir; _RC=filename('zips',"&dir"); _FIDDIR=dopen("zips"); if not _FIDDIR then stop; * cycle through dir elements; _DIRCOUNT = dnum(_FIDDIR); do _I = 1 to _DIRCOUNT;
* get zip name into macro var; _ZIPNAME=dread(_FIDDIR,_I); call symput(cats('zipName_',put(_I,best.)),_ZIPNAME);
* open zip; _RC=filename('inzip',_ZIPNAME); _FIDZIP=dopen("inzip"); if not _FIDZIP then stop;
* cycle through zip elements; _ZIPCOUNT=dnum(_FIDZIP); do _J=1 to _ZIPCOUNT;
* get xml name into macro var; _XMLNAME=scan(dread(_FIDZIP,_J),2,'\'); call symput(catx('_','xmlName_',put(_I,best.),put(_J,best.)),_XMLNAME); end;
_RC=close(_FIDZIP); call symputn(cats('xmlNObs_',put(_I,best.)),_J); * get total xml files in zip;
end;
_RC=close(_FIDDIR); call symputn('zipNObs',_I); * get total zip files in dir; run;
%put _all_;
But if you could provide us with the zipped xml files, that would help to give you a more accurate and optimized solution.
I understand it's confidentional data and there's no need for that, but maybe some kind of dummy xml files would greatly help.
Daniel Santos @ www.cgd.pt
... View more