<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: filename zip behaving badly in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/792739#M254013</link>
    <description>&lt;P&gt;That is call the BOM (byte order mark).&lt;/P&gt;
&lt;P&gt;Yes the ZIP engine does NOT process the BOM.&lt;/P&gt;
&lt;P&gt;You can either handle it yourself by skipping the first three bytes, perhaps you want to check them first?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  infile 'c:\downloads\bom.zip' zip member='bom.txt'  ;
  if _n_=1 then do;
     input @;
     if 'EFBBBF'x=substrn(_infile_,1,3) then _infile_=substrn(_infile_,4);
  end;
  input;
  list;
  put _infile_ $hex12.;
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;PRE&gt;1487  data _null_;
1488    infile 'c:\downloads\bom.zip' zip member='bom.txt' encoding='any' ;
1489    if _n_=1 then do;
1490       input @;
1491       if 'EFBBBF'x=substr(_infile_,1,3) then _infile_=substr(_infile_,4);
1492    end;
1493    input;
1494    list;
1495    put _infile_ $hex12.;
1496    stop;
1497  run;

NOTE: The infile 'c:\downloads\bom.zip' is:
      Filename=c:\downloads\bom.zip,
      Member Name=bom.txt,Size=6,Compressed Size=6,
      CRC-32=970C5C25,Date/Time=01-26-2022 22:27:54

787878
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         xxx 3
NOTE: 1 record was read from the infile 'c:\downloads\bom.zip'.
      The minimum record length was 6.
      The maximum record length was 6.
&lt;/PRE&gt;
&lt;P&gt;Or copy the file to a physical file and read that file. Then SAS will detect the BOM and not treat it as part of the real content of the file.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename from zip 'c:\downloads\bom.zip' member='bom.txt' ;
filename to temp;
%let rc=%sysfunc(fcopy(from,to));

data _null_;
  infile to;
  input;
  put _infile_ $hex12.;
  list;
  stop;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;PRE&gt;1510  filename from zip 'c:\downloads\bom.zip' member='bom.txt' ;
1511  filename to temp;
1512  %let rc=%sysfunc(fcopy(from,to));
1513
1514  data _null_;
1515    infile to;
1516    input;
1517    put _infile_ $hex12.;
1518    list;
1519    stop;
1520  run;

NOTE: A byte-order mark in the file "...\#LN00086"
      (for fileref "TO") indicates that the data is encoded in "utf-8".  This encoding will be used to process the file.
NOTE: The infile TO is:
      Filename=...\#LN00086,
      RECFM=V,LRECL=131068,File Size (bytes)=8,
      Last Modified=26Jan2022:22:42:48,
      Create Time=26Jan2022:22:42:48

787878
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         xxx 3
NOTE: 1 record was read from the infile TO.
      The minimum record length was 3.
      The maximum record length was 3.
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 27 Jan 2022 03:45:46 GMT</pubDate>
    <dc:creator>Tom</dc:creator>
    <dc:date>2022-01-27T03:45:46Z</dc:date>
    <item>
      <title>filename zip behaving badly</title>
      <link>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/792736#M254011</link>
      <description>&lt;PRE&gt;
    Directory: G:\FIN Credit Risk\Management


Mode                LastWriteTime         Length Name                                                                                
----                -------------         ------ ----                                                                                
da----         7/8/2020  11:30 AM                Auditorias                                                                          
da----        2/22/2021  12:37 PM                Politicas y procedimientos                                                          


    Directory: G:\FIN Credit Risk\Management\Auditorias


Mode                LastWriteTime         Length Name                                                                                
----                -------------         ------ ----                                                                                
d-----        4/19/2017  11:42 AM                autorizaciones                                                                      

&lt;/PRE&gt;
&lt;P&gt;I have a file with the text above called&amp;nbsp;management.dir.test.txt. I have the exact same file zipped in&amp;nbsp;management.dir.test.txt.zip.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I run:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename tt zip "&amp;amp;franriv/management.dir.test.txt.zip" member="management.dir.test.txt";

data testing;
	infile tt;
	input;
	length L $500;
	L=_infile_;

	retain dir;

	aa=find(L, "D");
	if index(L, "Directory")=5
		then dir=substr(L, 17);
	
	bb=find(L, 'Mode');
	if index(L, 'Mode') ne 1 and substr(L, 50, 4) ne '----';
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I get this:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="zip1_2601.png" style="width: 859px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/67884iFF66FB0CDA8B895A/image-size/large?v=v2&amp;amp;px=999" role="button" title="zip1_2601.png" alt="zip1_2601.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Notice: (1) Has weird characters and (2) incorrectly found first "D" in 10th position through index function.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But when I run:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data testing;
	infile "&amp;amp;franriv/management.dir.test.txt";
	input;
	length L $500;
	L=_infile_;

	retain dir;

	aa=find(L, "D");
	if index(L, "Directory")=5
		then dir=substr(L, 17);
	
	bb=find(L, 'Mode');
	if index(L, 'Mode') ne 1 and substr(L, 50, 4) ne '----';
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I get:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="zip2_2601.png" style="width: 915px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/67885iAF86F274F4D52237/image-size/large?v=v2&amp;amp;px=999" role="button" title="zip2_2601.png" alt="zip2_2601.png" /&gt;&lt;/span&gt;&amp;nbsp;Notice (1) it properly located first 'D' in 5th position.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I already tried combinations of TERMSTR=CRLF, RECFM=N, missover and truncover, but can't figure out why first code does not work.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I constantly read in *.zip files in data steps through filename zip. I could read this specific file outside the zip, but I need to understand what's going on (maybe even fix it) if I am to trust filename zip in the future (or otherwise be prepared for the possibility that it could fail).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 03:16:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/792736#M254011</guid>
      <dc:creator>franriv</dc:creator>
      <dc:date>2022-01-27T03:16:43Z</dc:date>
    </item>
    <item>
      <title>Re: filename zip behaving badly</title>
      <link>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/792739#M254013</link>
      <description>&lt;P&gt;That is call the BOM (byte order mark).&lt;/P&gt;
&lt;P&gt;Yes the ZIP engine does NOT process the BOM.&lt;/P&gt;
&lt;P&gt;You can either handle it yourself by skipping the first three bytes, perhaps you want to check them first?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  infile 'c:\downloads\bom.zip' zip member='bom.txt'  ;
  if _n_=1 then do;
     input @;
     if 'EFBBBF'x=substrn(_infile_,1,3) then _infile_=substrn(_infile_,4);
  end;
  input;
  list;
  put _infile_ $hex12.;
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;PRE&gt;1487  data _null_;
1488    infile 'c:\downloads\bom.zip' zip member='bom.txt' encoding='any' ;
1489    if _n_=1 then do;
1490       input @;
1491       if 'EFBBBF'x=substr(_infile_,1,3) then _infile_=substr(_infile_,4);
1492    end;
1493    input;
1494    list;
1495    put _infile_ $hex12.;
1496    stop;
1497  run;

NOTE: The infile 'c:\downloads\bom.zip' is:
      Filename=c:\downloads\bom.zip,
      Member Name=bom.txt,Size=6,Compressed Size=6,
      CRC-32=970C5C25,Date/Time=01-26-2022 22:27:54

787878
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         xxx 3
NOTE: 1 record was read from the infile 'c:\downloads\bom.zip'.
      The minimum record length was 6.
      The maximum record length was 6.
&lt;/PRE&gt;
&lt;P&gt;Or copy the file to a physical file and read that file. Then SAS will detect the BOM and not treat it as part of the real content of the file.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename from zip 'c:\downloads\bom.zip' member='bom.txt' ;
filename to temp;
%let rc=%sysfunc(fcopy(from,to));

data _null_;
  infile to;
  input;
  put _infile_ $hex12.;
  list;
  stop;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;PRE&gt;1510  filename from zip 'c:\downloads\bom.zip' member='bom.txt' ;
1511  filename to temp;
1512  %let rc=%sysfunc(fcopy(from,to));
1513
1514  data _null_;
1515    infile to;
1516    input;
1517    put _infile_ $hex12.;
1518    list;
1519    stop;
1520  run;

NOTE: A byte-order mark in the file "...\#LN00086"
      (for fileref "TO") indicates that the data is encoded in "utf-8".  This encoding will be used to process the file.
NOTE: The infile TO is:
      Filename=...\#LN00086,
      RECFM=V,LRECL=131068,File Size (bytes)=8,
      Last Modified=26Jan2022:22:42:48,
      Create Time=26Jan2022:22:42:48

787878
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1         xxx 3
NOTE: 1 record was read from the infile TO.
      The minimum record length was 3.
      The maximum record length was 3.
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 03:45:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/792739#M254013</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-01-27T03:45:46Z</dc:date>
    </item>
    <item>
      <title>Re: filename zip behaving badly</title>
      <link>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/793273#M254232</link>
      <description>Thanks! I'll look further into the BOM. &lt;BR /&gt;BOM explains weird caracter in first line read, but I still don't get why subsequent records have problem too.</description>
      <pubDate>Sat, 29 Jan 2022 09:25:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/793273#M254232</guid>
      <dc:creator>franriv</dc:creator>
      <dc:date>2022-01-29T09:25:59Z</dc:date>
    </item>
    <item>
      <title>Re: filename zip behaving badly</title>
      <link>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/793296#M254241</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/224992"&gt;@franriv&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Thanks! I'll look further into the BOM. &lt;BR /&gt;BOM explains weird caracter in first line read, but I still don't get why subsequent records have problem too.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Probably because you are treating them with the wrong encoding.&amp;nbsp; If the file really is using UTF-8 encoding there might be some "characters" in it that require more than one byte in the line.&amp;nbsp; If your current SAS session is using a single byte encoding, like WLATIN1 or LATIN1 then those will look like multiple character instead of one.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try changing the ENCODING= option on the INFILE statement.&amp;nbsp; Most likely you want to set it to UTF-8.&amp;nbsp; Not sure how that will impact the interpretation of the three byte BOM.&amp;nbsp; Try it and find out.&lt;/P&gt;</description>
      <pubDate>Sat, 29 Jan 2022 17:41:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/filename-zip-behaving-badly/m-p/793296#M254241</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2022-01-29T17:41:50Z</dc:date>
    </item>
  </channel>
</rss>

