SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
jimbarbour
Meteorite | Level 14

Background

I'm running SAS 9.4 M6 on a Windows Server 2016 Standard (64 bit) machine.  I'm trying to update and add some xml files embedded inside an Excel xlsx file (an Excel xlsx file is a Zip archive) in order to update a custom property, a label that marks the file as either "Internal" or "External."  If I manually unzip the xlsx file and manually change the embedded xml files using Windows Notepad and then manually re-zip everything, I can a) open xlsx file with Excel, b) the data displays correctly, and c) the Internal/External label works correctly. 

 

Problem

However, if I use the SAS Zip engine to do the very same thing, I a) cannot open the xlsx file with Excel, and b) if I try to use WinZip to read the files, I get CRC32 checksum errors.  See screen print.

jimbarbour_0-1667839732962.png

 

I suspect that SAS is updating the XML but not re-syncing the CRC32 check sums properly.  

 

Questions

Has anyone encountered something like this before?  Is there a way to manually cause SAS to resync all the CRC32 check sums in a Zip archive?

 

My program is below.  You should be able to execute the program on any SAS system, both Unix type and Windows.  I have tried running the four xml updates/adds in different order as well as individually one at a time to try to identify if one of the four is the problem, but no combination of changes work when all four are performed.  However, if I run any one of the four by itself without running all four, I can open and read the results using WinZip.  Something is getting corrupted when multiple operations are performed on a single xlsx file.

 

Jim

 

Specific xml files being updated or added

  • _rels/.rels (update)
  • docProps/app.xml (update)
  • docProps/custom.xml (add)
  • [Content_Types].xml (update)

Detailed Error Messages

I notice that even files that I am not touching are getting corrupted.  To me this would appear to indicate that the SAS Zip engine is not working correctly.

Extracting file: C:\Users\jbarbou6\Documents\SAS\Pgm\Training\Excel_Properties_ODS\Class_3.ZIP
Extracting to "C:\Users\jbarbou6\Documents\SAS\Pgm\Training\Excel_Properties_ODS\Class_3\"
Use Path: yes   Overlay Files: no
Extracting xl\worksheets\sheet1.xml
Extracting xl\sharedStrings.xml
Extracting docProps\core.xml
Extracting xl\styles.xml
Central and local directory mismatch for file "xl/styles.xml" (crc-32 - local: 1148BA33 hex  central: 3E9D0B61 hex).
Extracting docProps\app.xml
Extracting _rels\.rels
Central and local directory mismatch for file "_rels/.rels" (crc-32 - local: 1148BA33 hex  central: 52D5847E hex).
Extracting xl\_rels\workbook.xml.rels
Unable to find the local header for xl\_rels\workbook.xml.rels.
Extracting xl\workbook.xml
Central and local directory mismatch for file "xl/workbook.xml" (crc-32 - local: 721EAEAE hex  central: 4B8E0F38 hex).
Severe Error: Local and central CRC values don't match.

 

Program

%Time_Stamp(START);

**	Set slash to back slash for Windows or forward slash for LINUX/UNIX	**;
%LET	Slash		=	%SYSFUNC(IFC(%EVAL(&SYSSCP = WIN),\,/));

*%LET	Iteration	=	0;

%IF	%SYMEXIST(Iteration)	%THEN
	%DO;
		%LET	Iteration	=	%EVAL(&Iteration + 1);
	%END;
%ELSE
	%DO;
		%LET	Iteration	=	1;
	%END;

%PUT	&Nte1  Running Iteration &Iteration;

%LET	Path		=	C:\Users\jbarbou6\Documents\SAS\Pgm\Training\Excel_Properties_ODS;
*%LET	Path		=	I:\commercial\user\jbarbou3\ODS_Excel;
%LET	File_Name	=	&Path.&Slash.Class_&Iteration..xlsx;

**	Create an xlsx file with Proc Export	**;
PROC	EXPORT	DATA=sashelp.class	FILE="&File_Name"	REPLACE	DBMS=xlsx;
RUN;

FILENAME	XL_File	ZIP	"&File_Name";

**------------------------- Relationships --------------------------------**;
**	Read in the .rels relationships file, add a reference to custom.xml, and write out a SAS data set containing the modified .rels xml	**;
DATA	WORK.Rels_XML;
	DROP	_:;
	INFILE	XL_File(_rels/.rels)	LENGTH=SAS_Length;
	INPUT	Line	$VARYING32767.	SAS_Length;
	_Position	=	FIND(Line, '</relationships>', 'i');
	IF	_Position	THEN
		DO;
			PUTLOG	"&Nte1  Adding custom.xml entry to spreadsheet iteration &Iteration";
			SUBSTR(Line,_Position)	=	'<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/custom-properties" Target="docProps/custom.xml"/></Relationships>';
		END;
RUN;

**------------------------- Content Types --------------------------------**;
**	Read in the Content Types XML file, add a reference to custom.xml, and write out a SAS data set containing the modified Content Types XML.	**;
DATA	WORK.Content_Types_XML;
	DROP	_:;
	INFILE	XL_File([Content_Types].xml)	LENGTH=SAS_Length;
	INPUT	Line	$VARYING32767.	SAS_Length;
	_Position	=	FIND(Line, '</Types>', 'i');
	IF	_Position	THEN
		DO;
			PUTLOG	"&Nte1  Adding custom.xml entry to spreadsheet iteration &Iteration";
			SUBSTR(Line,_Position)	=	'<Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/></Types>';
		END;
RUN;

**--------------------------- Custom.XML --------------------------------**;
**	Create a SAS data set that contains the custom XML that will set the label to either External or Internal	**;
DATA	WORK.Custom_XML;
	INFILE	DATALINES4	LENGTH=SAS_Length;
	INPUT	Line	$VARYING32767.	SAS_Length;
DATALINES4;
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/custom-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="2" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_Enabled"><vt:lpwstr>true</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="3" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_SetDate"><vt:lpwstr>2022-11-05T21:48:26Z</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="4" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_Method"><vt:lpwstr>Privileged</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="5" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_Name"><vt:lpwstr>External Label</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="6" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_SiteId"><vt:lpwstr>db05faca-c82a-4b9d-b9c5-0f64b6755421</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="7" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_ActionId"><vt:lpwstr>6d6ffa67-7505-4457-b8c1-4592fb83e3c9</vt:lpwstr></property><property fmtid="{D5CDD505-2E9C-101B-9397-08002B2CF9AE}" pid="8" name="MSIP_Label_320f21ee-9bdc-4991-8abe-58f53448e302_ContentBits"><vt:lpwstr>0</vt:lpwstr></property></Properties>
;;;;
RUN;

**--------------------------- App.XML --------------------------------**;
**	Create a SAS data set that contains the App.XML that will correspond to the rest of the XML	**;
DATA	WORK.App_XML;
	INFILE	DATALINES4	LENGTH=SAS_Length;
	INPUT	Line	$VARYING32767.	SAS_Length;
DATALINES4;
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties" xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes"><Application>Microsoft Excel</Application><DocSecurity>0</DocSecurity><ScaleCrop>false</ScaleCrop><HeadingPairs><vt:vector size="4" baseType="variant"><vt:variant><vt:lpstr>Worksheets</vt:lpstr></vt:variant><vt:variant><vt:i4>1</vt:i4></vt:variant><vt:variant><vt:lpstr>Named Ranges</vt:lpstr></vt:variant><vt:variant><vt:i4>1</vt:i4></vt:variant></vt:vector></HeadingPairs><TitlesOfParts><vt:vector size="2" baseType="lpstr"><vt:lpstr>Class_25</vt:lpstr><vt:lpstr>Class_25</vt:lpstr></vt:vector></TitlesOfParts><Company>SAS Institute Inc</Company><LinksUpToDate>false</LinksUpToDate><SharedDoc>false</SharedDoc><HyperlinksChanged>false</HyperlinksChanged><AppVersion>16.0300</AppVersion></Properties>
;;;;
RUN;

**	Update/add the XML files to the xlsx file which is a Zip archive	**;
DATA	_NULL_;
	LENGTH	Mem_Name	$128;

	INFILE	DATALINES4;
	INPUT	Mem_Name	$;
	FILE	XL_File	MEMVAR=Mem_Name;
	PUTLOG	"&Nte1  1.  "	_N_=  Mem_Name=;

	IF	Mem_Name	=	'docProps/custom.xml'	THEN
		DO;
			**	Create the custom.xml file from the SAS data set that contains the custom XML that will set the label to either External or Internal	**;
			PUTLOG	"&Nte1  Processing "		Mem_Name;
			DO	UNTIL	(End_Custom);	
				SET	WORK.CUSTOM_XML				END=End_Custom;
				Leng	=	length(line);
				PUT	Line	$Varying32767.		Leng;
			END;
		END;

	IF	Mem_Name	=	'docProps/app.xml'	THEN
		DO;
			**	Create the app.xml file from the SAS data set that contains the App XML	**;
			PUTLOG	"&Nte1  Processing "		Mem_Name;
			DO	UNTIL	(End_App);	
				SET	WORK.App_XML				END=End_App;
				Leng	=	length(line);
				PUT	Line	$Varying32767.		Leng;
			END;
		END;

	IF	Mem_Name	=	'_rels/.rels'			THEN
		DO;
			**	Write the modified .rels xml to the .rels file	**;
			PUTLOG	"&Nte1  Processing "		Mem_Name;
			DO	UNTIL	(End_Rels);
				SET	WORK.Rels_XML				END=End_Rels;
				Leng	=	length(line);
				PUT	Line	$Varying32767.		Leng;
			END;
		END;

	IF	Mem_Name	=	'[Content_Types].xml'	THEN
		DO;
			**	Write the modified Content Types XML to the Content Types XML file	**;
			PUTLOG	"&Nte1  Processing "		Mem_Name;
			DO	UNTIL	(End_Content);
				SET	WORK.Content_Types_XML		END=End_Content;
				Leng	=	length(line);
				PUT	Line	$Varying32767.		Leng;
			END;
		END;

DATALINES4;
_rels/.rels
docProps/custom.xml
docProps/app.xml
[Content_Types].xml
;;;;
RUN;

%Time_Stamp(STOP);

 

Excel error message.

When I try to open the xlsx file with Excel, I receive the following message. Any recovery attempt(s) fail.

jimbarbour_1-1667841453889.png

 

11 REPLIES 11
Tom
Super User Tom
Super User

Can you recreate the issue with a simpler program?

jimbarbour
Meteorite | Level 14

@Tom wrote:

Can you recreate the issue with a simpler program?


Hmm.  Good question.  Maybe.  I need to have a coordinated set of changes to an Excel file.  Of the four updates/adds I'm making, all have to be in place in order for things to work.  Just adding, say, the custom.xml does nothing since the Content Types and Rels xml files don't have any pointers to the file.  In addition, when I've done an update to just one xml file, it works just fine in terms of the CRC32 check sums.  It's something about doing multiple updates to a single Zip archive that is throwing the check sums off (from what I can tell).  But this is a good idea.  Let me play with it a bit.

 

Jim

Tom
Super User Tom
Super User

What if you copy the new files into the archive instead?

If that doesn't fix it then what if you DELETE the old file first?

If that doesn't fix it then what if you make a whole new archive and copy all of the files there, picking the proper version of each file?

 

jimbarbour
Meteorite | Level 14

@Tom wrote:

What if you copy the new files into the archive instead?


That does work when I copy them in by hand.  I haven't tried that using a SAS program.  I'm trying to avoid any OS level commands since my program needs to work in both Windows and UNIX type environments.

 

@Tom wrote:

If that doesn't fix it then what if you DELETE the old file first?

When I've tried deleting the old file first, then SAS gives me a message that "file is not a zip file," and I can't proceed.  I could play with this some more though.

 

@Tom wrote:

If that doesn't fix it then what if you make a whole new archive and copy all of the files there, picking the proper version of each file?

Yeah, that's what I've kind of resigned myself to. 

  1. Extract everything from an xlsx file 
  2. Update all files in unzipped form
  3. Delete the original xlsx file
  4. Create a new xlsx file with the same name as the original xlsx file.

As I dig a little further, I see that subtle changes are made to every xml file inside an xlsx Zip archive when this Internal/External label is set.  When Proc Export with just default settings creates an xlsx file, 10 xml files are created inside.  I want to add a custom.xml file, for a total of 11 files.  Hard coding all of these xml changes/adds seems like it may not be practical, particularly if the Excel VB code ever changes.  I'm going to have to talk to my internal client and see if they can just set the label manually.  Given the number of xml files and SAS' seeming difficulty with CRC32 check sums, my approach may not really be workable.

 

I've learned one interesting thing here:  The use of a MemVar which does essentially for Zip files what a FileVar would do for unzipped files.  One should however beware of potential CRC32 check sum errors.

 

Jim

 

 

 

 

 

 

 

jimbarbour
Meteorite | Level 14

@Tom@Kurt_Bremser, and @ballardw,

 

I was able to create a work-around that a) created the custom Excel properties I originally sought and b) doesn't have the CRC-32 check sum errors that using the SAS Zip engine causes.  In the event this is of interest, read on.  I'll eventually post SAS code for those who might be interested -- once I clean things up a bit and package it into a macro.

 


@Tom wrote:

If that doesn't fix it then what if you make a whole new archive and copy all of the files there, picking the proper version of each file?

Yeah, that's what I've kind of resigned myself to:

  1. Extract everything from an xlsx file 
  2. Update all files in unzipped form
  3. Delete the original xlsx file
  4. Create a new xlsx file with the same name as the original xlsx file.

I was able work-around the Zip engine invalid CRC-32 problem by doing steps 1, 2, and 4 of the steps listed above.  Rather than delete the original xlsx file, I just create a new xlsx file using ODS Package.  I presume that ODS Package overlays the original xlsx file.  By using ODS Package, I can now open the resultant xlsx file with Excel, the contents display correctly, and the Internal/External label custom property is set. 

 

I'm going to leave this topic unsolved for a bit longer in case @ChrisHemedinger has a solution for the Zip engine problem so others don't have to resort to my somewhat kludgey work-around.  It's a bit of coding to get the work-around in place. 

 

Once I clean up my spaghetti and turn it into a macro, I'll post the SAS code in the original thread, https://communities.sas.com/t5/ODS-and-Base-Reporting/Excel-with-Custom-Property-Names/m-p/842420#M2... 

 

Thanks for all the help and ideas,

 

Jim

 

ChrisHemedinger
Community Manager

I don't have a better approach. Glad you got it solved.

 

If you can generalize the issue to "using the FILENAME ZIP to update/replace a member will corrupt the zip file", then I'd pursue with Tech Support. If it's specific to updating XLSX files only, then I suspect the answer will be "well, only Microsoft apps can be expected to understand all of the mysteries of editing XSLX files in place..."

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!
jimbarbour
Meteorite | Level 14

Hi, Chris,

 

Thanks.  I'll have to see if I can replicate the problem.  It does however seem to be a Zip engine issue inasmuch as I'm just making text based edits to a zip archive.  WinZip, which presumably knows nothing of Microsoft Excel's inner mysteries, handles things just fine.

 

Right now, I've got to get a Production version of this working.  Hopefully I can experiment more later this week.  It's a very odd problem that the files that I'm not changing are the ones that are getting corrupted.  I suspect it's the inner mysteries of multi-directory Zip archives more than it is the inner mysteries of Excel, but we'll see.

 

Thanks for taking the time to respond,

 

Jim

jimbarbour
Meteorite | Level 14

If it were of interest, I've added code (bundled up into a macro) for my work around for the CRC-32 check sum errors I encountered when using the Zip engine in the original thread:  https://communities.sas.com/t5/ODS-and-Base-Reporting/Excel-with-Custom-Property-Names/td-p/842420/p...

 

Jim

jimbarbour
Meteorite | Level 14

Further research:

  • Although all xml files in the xlsx file have some kind of update, really only three must be updated/added to set the label:
    1. custom.xml (add)
    2. [Content_Types].xml (update)
    3. _rels/.rels (update)
  • Adding custom.xml works just fine, but updating either [Content_Types].xml or _rels/.rels causes other xml files to become corrupted, typically in the xl folder.  Interestingly, when I update either [Content_Types].xml or _rels/.rels, the updates themselves work fine.  It is other xml files that get corrupted.
  • I tried deleting [Content_Types].xml (or _rels/.rels) before updating, but if I delete them before updating, the update Data step can't open the xlsx file.  SAS says "The filename I:\commercial\user\jbarbou3\ODS_Excel\Class_17.xlsx is not a zip file".

Pretty stuck here.  I wonder if @ChrisHemedinger might be able to help a poor soul out.

 

Synopsis:  I'm trying to update/add the three above listed xml files in an xlsx file.  The add works fine, but either or both of the two updates causes corruption in other xml files inside the xlsx, files that I haven't touched.  Any tips here?

 

Jim

Tom
Super User Tom
Super User

7Zip seems to give a clue as to what is happening when I attempt to extract the contents of the modified file. 

You seem to have two copies of the same file in the ZIP file.

Tom_0-1667848970227.png

 

jimbarbour
Meteorite | Level 14

@Tom wrote:

7Zip seems to give a clue as to what is happening when I attempt to extract the contents of the modified file. 

You seem to have two copies of the same file in the ZIP file.


Interesting.  WinZip doesn't show any such thing.  I tried deleting the xml components prior to updating them, but then SAS gave me a message that "file is not a zip archive."  Let me play with it a bit.

 

Jim

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2962 views
  • 5 likes
  • 3 in conversation