Background
I'm running SAS 9.4 M6 on a Windows Server 2016 Standard (64 bit) machine. I'm trying to update and add some xml files embedded inside an Excel xlsx file (an Excel xlsx file is a Zip archive) in order to update a custom property, a label that marks the file as either "Internal" or "External." If I manually unzip the xlsx file and manually change the embedded xml files using Windows Notepad and then manually re-zip everything, I can a) open xlsx file with Excel, b) the data displays correctly, and c) the Internal/External label works correctly.
Problem
However, if I use the SAS Zip engine to do the very same thing, I a) cannot open the xlsx file with Excel, and b) if I try to use WinZip to read the files, I get CRC32 checksum errors. See screen print.
I suspect that SAS is updating the XML but not re-syncing the CRC32 check sums properly.
Questions
Has anyone encountered something like this before? Is there a way to manually cause SAS to resync all the CRC32 check sums in a Zip archive?
My program is below. You should be able to execute the program on any SAS system, both Unix type and Windows. I have tried running the four xml updates/adds in different order as well as individually one at a time to try to identify if one of the four is the problem, but no combination of changes work when all four are performed. However, if I run any one of the four by itself without running all four, I can open and read the results using WinZip. Something is getting corrupted when multiple operations are performed on a single xlsx file.
Jim
Specific xml files being updated or added
Detailed Error Messages
I notice that even files that I am not touching are getting corrupted. To me this would appear to indicate that the SAS Zip engine is not working correctly.
Extracting file: C:\Users\jbarbou6\Documents\SAS\Pgm\Training\Excel_Properties_ODS\Class_3.ZIP Extracting to "C:\Users\jbarbou6\Documents\SAS\Pgm\Training\Excel_Properties_ODS\Class_3\" Use Path: yes Overlay Files: no Extracting xl\worksheets\sheet1.xml Extracting xl\sharedStrings.xml Extracting docProps\core.xml Extracting xl\styles.xml Central and local directory mismatch for file "xl/styles.xml" (crc-32 - local: 1148BA33 hex central: 3E9D0B61 hex). Extracting docProps\app.xml Extracting _rels\.rels Central and local directory mismatch for file "_rels/.rels" (crc-32 - local: 1148BA33 hex central: 52D5847E hex). Extracting xl\_rels\workbook.xml.rels Unable to find the local header for xl\_rels\workbook.xml.rels. Extracting xl\workbook.xml Central and local directory mismatch for file "xl/workbook.xml" (crc-32 - local: 721EAEAE hex central: 4B8E0F38 hex). Severe Error: Local and central CRC values don't match.
Program
%Time_Stamp(START);
** Set slash to back slash for Windows or forward slash for LINUX/UNIX **;
%LET Slash = %SYSFUNC(IFC(%EVAL(&SYSSCP = WIN),\,/));
*%LET Iteration = 0;
%IF %SYMEXIST(Iteration) %THEN
%DO;
%LET Iteration = %EVAL(&Iteration + 1);
%END;
%ELSE
%DO;
%LET Iteration = 1;
%END;
%PUT &Nte1 Running Iteration &Iteration;
%LET Path = C:\Users\jbarbou6\Documents\SAS\Pgm\Training\Excel_Properties_ODS;
*%LET Path = I:\commercial\user\jbarbou3\ODS_Excel;
%LET File_Name = &Path.&Slash.Class_&Iteration..xlsx;
** Create an xlsx file with Proc Export **;
PROC EXPORT DATA=sashelp.class FILE="&File_Name" REPLACE DBMS=xlsx;
RUN;
FILENAME XL_File ZIP "&File_Name";
**------------------------- Relationships --------------------------------**;
** Read in the .rels relationships file, add a reference to custom.xml, and write out a SAS data set containing the modified .rels xml **;
DATA WORK.Rels_XML;
DROP _:;
INFILE XL_File(_rels/.rels) LENGTH=SAS_Length;
INPUT Line $VARYING32767. SAS_Length;
_Position = FIND(Line, '</relationships>', 'i');
IF _Position THEN
DO;
PUTLOG "&Nte1 Adding custom.xml entry to spreadsheet iteration &Iteration";
SUBSTR(Line,_Position) = '<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/custom-properties" Target="docProps/custom.xml"/></Relationships>';
END;
RUN;
**------------------------- Content Types --------------------------------**;
** Read in the Content Types XML file, add a reference to custom.xml, and write out a SAS data set containing the modified Content Types XML. **;
DATA WORK.Content_Types_XML;
DROP _:;
INFILE XL_File([Content_Types].xml) LENGTH=SAS_Length;
INPUT Line $VARYING32767. SAS_Length;
_Position = FIND(Line, '</Types>', 'i');
IF _Position THEN
DO;
PUTLOG "&Nte1 Adding custom.xml entry to spreadsheet iteration &Iteration";
SUBSTR(Line,_Position) = '<Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/></Types>';
END;
RUN;
**--------------------------- Custom.XML --------------------------------**;
** Create a SAS data set that contains the custom XML that will set the label to either External or Internal **;
DATA WORK.Custom_XML;
INFILE DATALINES4 LENGTH=SAS_Length;
INPUT Line $VARYING32767. SAS_Length;
DATALINES4;
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="2" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_Enabled"><vt:lpwstr>true</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="3" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_SetDate"><vt:lpwstr>2022-11-05T21:48:26Z</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="4" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_Method"><vt:lpwstr>Privileged</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="5" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_Name"><vt:lpwstr>External Label</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="6" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_SiteId"><vt:lpwstr>db05faca-c82a-4b9d-b9c5-0f64b6755421</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="7" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_ActionId"><vt:lpwstr>6d6ffa67-7505-4457-b8c1-4592fb83e3c9</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="8" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_ContentBits"><vt:lpwstr>0</vt:lpwstr></property></Properties>
;;;;
RUN;
**--------------------------- App.XML --------------------------------**;
** Create a SAS data set that contains the App.XML that will correspond to the rest of the XML **;
DATA WORK.App_XML;
INFILE DATALINES4 LENGTH=SAS_Length;
INPUT Line $VARYING32767. SAS_Length;
DATALINES4;
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"><Application>Microsoft Excel</Application><DocSecurity>0</DocSecurity><ScaleCrop>false</ScaleCrop><HeadingPairs><vt:vector size="4" baseType="variant"><vt:variant><vt:lpstr>Worksheets</vt:lpstr></vt:variant><vt:variant><vt:i4>1</vt:i4></vt:variant><vt:variant><vt:lpstr>Named Ranges</vt:lpstr></vt:variant><vt:variant><vt:i4>1</vt:i4></vt:variant></vt:vector></HeadingPairs><TitlesOfParts><vt:vector size="2" baseType="lpstr"><vt:lpstr>Class_25</vt:lpstr><vt:lpstr>Class_25</vt:lpstr></vt:vector></TitlesOfParts><Company>SAS Institute Inc</Company><LinksUpToDate>false</LinksUpToDate><SharedDoc>false</SharedDoc><HyperlinksChanged>false</HyperlinksChanged><AppVersion>16.0300</AppVersion></Properties>
;;;;
RUN;
** Update/add the XML files to the xlsx file which is a Zip archive **;
DATA _NULL_;
LENGTH Mem_Name $128;
INFILE DATALINES4;
INPUT Mem_Name $;
FILE XL_File MEMVAR=Mem_Name;
PUTLOG "&Nte1 1. " _N_= Mem_Name=;
IF Mem_Name = 'docProps/custom.xml' THEN
DO;
** Create the custom.xml file from the SAS data set that contains the custom XML that will set the label to either External or Internal **;
PUTLOG "&Nte1 Processing " Mem_Name;
DO UNTIL (End_Custom);
SET WORK.CUSTOM_XML END=End_Custom;
Leng = length(line);
PUT Line $Varying32767. Leng;
END;
END;
IF Mem_Name = 'docProps/app.xml' THEN
DO;
** Create the app.xml file from the SAS data set that contains the App XML **;
PUTLOG "&Nte1 Processing " Mem_Name;
DO UNTIL (End_App);
SET WORK.App_XML END=End_App;
Leng = length(line);
PUT Line $Varying32767. Leng;
END;
END;
IF Mem_Name = '_rels/.rels' THEN
DO;
** Write the modified .rels xml to the .rels file **;
PUTLOG "&Nte1 Processing " Mem_Name;
DO UNTIL (End_Rels);
SET WORK.Rels_XML END=End_Rels;
Leng = length(line);
PUT Line $Varying32767. Leng;
END;
END;
IF Mem_Name = '[Content_Types].xml' THEN
DO;
** Write the modified Content Types XML to the Content Types XML file **;
PUTLOG "&Nte1 Processing " Mem_Name;
DO UNTIL (End_Content);
SET WORK.Content_Types_XML END=End_Content;
Leng = length(line);
PUT Line $Varying32767. Leng;
END;
END;
DATALINES4;
_rels/.rels
docProps/custom.xml
docProps/app.xml
[Content_Types].xml
;;;;
RUN;
%Time_Stamp(STOP);
Excel error message.
When I try to open the xlsx file with Excel, I receive the following message. Any recovery attempt(s) fail.
Can you recreate the issue with a simpler program?
@Tom wrote:
Can you recreate the issue with a simpler program?
Hmm. Good question. Maybe. I need to have a coordinated set of changes to an Excel file. Of the four updates/adds I'm making, all have to be in place in order for things to work. Just adding, say, the custom.xml does nothing since the Content Types and Rels xml files don't have any pointers to the file. In addition, when I've done an update to just one xml file, it works just fine in terms of the CRC32 check sums. It's something about doing multiple updates to a single Zip archive that is throwing the check sums off (from what I can tell). But this is a good idea. Let me play with it a bit.
Jim
What if you copy the new files into the archive instead?
If that doesn't fix it then what if you DELETE the old file first?
If that doesn't fix it then what if you make a whole new archive and copy all of the files there, picking the proper version of each file?
@Tom wrote:
What if you copy the new files into the archive instead?
That does work when I copy them in by hand. I haven't tried that using a SAS program. I'm trying to avoid any OS level commands since my program needs to work in both Windows and UNIX type environments.
@Tom wrote:
If that doesn't fix it then what if you DELETE the old file first?
When I've tried deleting the old file first, then SAS gives me a message that "file is not a zip file," and I can't proceed. I could play with this some more though.
@Tom wrote:
If that doesn't fix it then what if you make a whole new archive and copy all of the files there, picking the proper version of each file?
Yeah, that's what I've kind of resigned myself to.
As I dig a little further, I see that subtle changes are made to every xml file inside an xlsx Zip archive when this Internal/External label is set. When Proc Export with just default settings creates an xlsx file, 10 xml files are created inside. I want to add a custom.xml file, for a total of 11 files. Hard coding all of these xml changes/adds seems like it may not be practical, particularly if the Excel VB code ever changes. I'm going to have to talk to my internal client and see if they can just set the label manually. Given the number of xml files and SAS' seeming difficulty with CRC32 check sums, my approach may not really be workable.
I've learned one interesting thing here: The use of a MemVar which does essentially for Zip files what a FileVar would do for unzipped files. One should however beware of potential CRC32 check sum errors.
Jim
@Tom, @Kurt_Bremser, and @ballardw,
I was able to create a work-around that a) created the custom Excel properties I originally sought and b) doesn't have the CRC-32 check sum errors that using the SAS Zip engine causes. In the event this is of interest, read on. I'll eventually post SAS code for those who might be interested -- once I clean things up a bit and package it into a macro.
@Tom wrote:If that doesn't fix it then what if you make a whole new archive and copy all of the files there, picking the proper version of each file?
Yeah, that's what I've kind of resigned myself to:
- Extract everything from an xlsx file
- Update all files in unzipped form
Delete the original xlsx file- Create a new xlsx file with the same name as the original xlsx file.
I was able work-around the Zip engine invalid CRC-32 problem by doing steps 1, 2, and 4 of the steps listed above. Rather than delete the original xlsx file, I just create a new xlsx file using ODS Package. I presume that ODS Package overlays the original xlsx file. By using ODS Package, I can now open the resultant xlsx file with Excel, the contents display correctly, and the Internal/External label custom property is set.
I'm going to leave this topic unsolved for a bit longer in case @ChrisHemedinger has a solution for the Zip engine problem so others don't have to resort to my somewhat kludgey work-around. It's a bit of coding to get the work-around in place.
Once I clean up my spaghetti and turn it into a macro, I'll post the SAS code in the original thread, https://communities.sas.com/t5/ODS-and-Base-Reporting/Excel-with-Custom-Property-Names/m-p/842420#M2...
Thanks for all the help and ideas,
Jim
I don't have a better approach. Glad you got it solved.
If you can generalize the issue to "using the FILENAME ZIP to update/replace a member will corrupt the zip file", then I'd pursue with Tech Support. If it's specific to updating XLSX files only, then I suspect the answer will be "well, only Microsoft apps can be expected to understand all of the mysteries of editing XSLX files in place..."
Hi, Chris,
Thanks. I'll have to see if I can replicate the problem. It does however seem to be a Zip engine issue inasmuch as I'm just making text based edits to a zip archive. WinZip, which presumably knows nothing of Microsoft Excel's inner mysteries, handles things just fine.
Right now, I've got to get a Production version of this working. Hopefully I can experiment more later this week. It's a very odd problem that the files that I'm not changing are the ones that are getting corrupted. I suspect it's the inner mysteries of multi-directory Zip archives more than it is the inner mysteries of Excel, but we'll see.
Thanks for taking the time to respond,
Jim
If it were of interest, I've added code (bundled up into a macro) for my work around for the CRC-32 check sum errors I encountered when using the Zip engine in the original thread: https://communities.sas.com/t5/ODS-and-Base-Reporting/Excel-with-Custom-Property-Names/td-p/842420/p...
Jim
Further research:
Pretty stuck here. I wonder if @ChrisHemedinger might be able to help a poor soul out.
Synopsis: I'm trying to update/add the three above listed xml files in an xlsx file. The add works fine, but either or both of the two updates causes corruption in other xml files inside the xlsx, files that I haven't touched. Any tips here?
Jim
7Zip seems to give a clue as to what is happening when I attempt to extract the contents of the modified file.
You seem to have two copies of the same file in the ZIP file.
@Tom wrote:
7Zip seems to give a clue as to what is happening when I attempt to extract the contents of the modified file.
You seem to have two copies of the same file in the ZIP file.
Interesting. WinZip doesn't show any such thing. I tried deleting the xml components prior to updating them, but then SAS gave me a message that "file is not a zip archive." Let me play with it a bit.
Jim
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.