BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BenTN
Fluorite | Level 6

Hi,

 

Im not quite sure how to approach this question, because the code below works... sometimes. I guess Im trying to understand what the underlying issue\bug(?) is and if there is either a way to catch the error, or alternatively if there is an elegant way to approach this problem.

 

Anyhoo, here is the code:

 

DATA d;
	LENGTH inputtype $1.
FN $500. FN2 $250.; INFILE DATALINES DSD delimiter='|'; INPUT inputtype $ FN $ FN2 $; SELECT (inputtype); WHEN ('N') INFILE foo filevar=fn memvar=fn2 LRECL=500 end=done TRUNCOVER; WHEN ('Z') INFILE foo ZIP filevar=fn memvar=fn2 LRECL=500 end=done TRUNCOVER; OTHERWISE; END; DO while (not done); INPUT &datain.; OUTPUT; END; DROP inputtype; DATALINES; N|N:\...\data|2011.txt N|N:\...\data|2010.txt Z|N:\...\dataarchive\2000_2009.zip|2009.txt Z|N:\...\dataarchive\2000_2009.zip|2008.txt ...
Z|N:\...\dataarchive\2000_2009.zip|2000.txt ;

The problem the code is trying to solve is that my group has annual data stored in fixed width text files. However, at some point the files get archived into ZIP folders. However, since the formatting has not changed I want to pull all the data in to a single data set. So I started looking for a way to change between the normal read and the ZIP read engines.

Again the code above does this. -- Sometimes. But there are a lot of subtle changes, that cuase it to fail, that have no bussiness being an issue.


However, if I switch the WHEN statements around so that it reads:

SELECT (inputtype);
		WHEN ('Z') INFILE foo ZIP filevar=fn memvar=fn2 LRECL=500 end=done TRUNCOVER;
		WHEN ('N') INFILE foo filevar=fn memvar=fn2 LRECL=500 end=done TRUNCOVER;
		OTHERWISE;
		END;

I get a ERROR: INVALID RECORD FORMAT on the first line, of the first data set.

 

Or if I switch the data in DATALINES around so that Zip Files come before the non-ZIP files.  I get an INVALID RECORD FORMAT at the start of the non-ZIP lines.

 

Really if the ZIP device-type on INVALUE in any way comes before the non-ZIPPED version errors get thrown.

 

The issue persists so that if I take the code that works. Then I run it a second time. I get an ERROR:INVALID RECORD FORMAT. on the first line of the data. (But if I do run it again, then it runs fine).

 

 

Additionally,
If I change the placeholder file refernce on one of the infile statements so I have something like:

	WHEN ('Z') INFILE foo ZIP filevar=fn memvar=fn2 LRECL=500 end=done TRUNCOVER;
	WHEN ('N') INFILE bar filevar=fn memvar=fn2 LRECL=500 end=done TRUNCOVER;
		

I get the INVALID RECORD FORMAT error.

Thoughts as to whats going on? Is there a way of dynamically change what read engine SAS is using that is a bit less fickle, or at least get around the error that is triggered by a rerun?

Anyhoo if it helps Im running SAS 9.04.01M3

On Enterprise Guide 7.15 HF9

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Check the ZIP files and make sure they do not include BOM (Byte Order Mark ) characters.

data _null_;
  infile "N:\...\dataarchive\2000_2009.zip" zip member="2009.txt" obs=1;
  input;
  list;
run;

The ZIP engine does not recognize the BOM characters and so they become part of the first record.

 

You could conditionally read the first characters of the ZIP file and make sure it is not BOM characters.  If it is NOT the BOM characters then move the pointer back to start of the line.  Also don't bother setting LRECL=500. You don't want to lose the last 3 bytes if the BOM characters are there.  And the current default is 32,767 which should be plenty long enough for 500 byte records.

    WHEN ('Z') do ;
        INFILE ZIP ZIP FILEVAR=fn MEMVAR=fn2 END=done TRUNCOVER;
        if not done then do;
           input BOM $hex3. @ ;
           if bom ne 'EFBBBF'x then input @1 @;
        end;
    end;

View solution in original post

6 REPLIES 6
ballardw
Super User

It would help to show the complete log when running this and something fails.

Code and all notes, messages, warnings and/or errors. Paste into a text box.

Feel free to XXX out anything you deem sensitive though if it is part of a warning you may be doing yourself a disservice.

BenTN
Fluorite | Level 6

Sure:

Here is the log when I run it for a second time:

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         
27         DATA d_old;
28         	LENGTH inputtype $1.
29         		  FN $500.
30         		  FN2 $250.;
31         	INFILE DATALINES DSD delimiter='|';
32         	INPUT inputtype $ FN $ FN2 $;
33         
34         	SELECT (inputtype);
35         		WHEN ('N') INFILE foo filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
36         		WHEN ('Z') INFILE foo ZIP filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
37         		OTHERWISE;
38         		END;
39         
40         
41         	DO while (not done);
42         		INPUT &datain.;
43         		OUTPUT;
44         		END;
45         
46         	DROP inputtype;
47         
48         DATALINES;

ERROR: Invalid record format.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
49         N|XXXXX|2011.
      101  txt
inputtype=N FN=XXXXX FN2=2011.txt done=0
DeathPlaceStateCode=  CertNumber=  DecNameLast=  DecNameFirst=  DecNameMI=  DecNameGenID=  Soundex=  AliasCode=  Sex=  DeathDateMo= 
DeathDateDay=  DeathDateCC=  DeathDateYY=  DecSSN=  DecAgeUnit=  DecAgeNumber=  DecAgeMonths=  DecAgeDays=  DecAgeHours= 
DecAgeMin=  DecDOBMo=  DecDOBDay=  DecDOBCC=  DecDOBYY=  DecBirthPlaceStateCountryCode=  DecInfCertNumTNBirth= 
DecInfCertNumOOSBirth=  DeathPlaceCode=  DeathPlaceType=  DeathPlaceCityCode=  DeathPlaceCountyCode=  DecMaritalStatus= 
DecResStateCode=  DecRaceSexWNU=  DecRaceSexWBOU=  DecResCity=  DecResAddress=  DecResCityLimitsInd=  DecResZip=  ResStateInd= 
DecResCountyStateCode=  DecResCityCode=  ResCensusTract=  DecHispanicOriginCode=  DecRace=  DecEducationYrs=  DadNameLast= 
DadNameFirst=  MomNameMaidenLast=  MomNameFirst=  FuneralHomeCode=  FuneralHomeCountyAbbrev=  FuneralHome=  FuneralHomeAddInfo= 
RegistrarRecDateMo=  RegistrarRecDay=  RegistrarRecDateCC=  RegistrarRecDateYY=  CertifierType=  CertifierLicenseNum= 
MedExamLicenseNum=  CertifierSignDateMo=  CertifierSignDateDay=  CertifierSignDateCC=  CertifierSignDateYY=  Autopsy= 
AutopsyUsedForUCD=  DeathManner=  WorkInjury=  CODQryCode=  UCD=  UCDGroupCode=  CODEntAxis01=  CODEntAxis02=  CODEntAxis03= 
CODEntAxis04=  CODEntAxis05=  CODEntAxis06=  CODEntAxis07=  CODEntAxis08=  CODEntAxis09=  CODEntAxis10=  CODEntAxis11= 
CODEntAxis12=  CODEntAxis13=  CODEntAxis14=  CODEntAxis15=  CODEntAxis16=  CODEntAxis17=  CODEntAxis18=  CODEntAxis19= 
CODEntAxis20=  CODRecAxis01=  CODRecAxis02=  CODRecAxis03=  CODRecAxis04=  CODRecAxis05=  CODRecAxis06=  CODRecAxis07= 
CODRecAxis08=  CODRecAxis09=  CODRecAxis10=  CODRecAxis11=  CODRecAxis12=  CODRecAxis13=  CODRecAxis14=  CODRecAxis15= 
CODRecAxis16=  CODRecAxis17=  CODRecAxis18=  CODRecAxis19=  CODRecAxis20=  SystemDateMo=  SystemDateDay=  SystemDateYY= 
DecInArmedForces=  _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.D_OLD may be incomplete.  When this step was stopped there were 0 observations and 116 variables.
WARNING: Data set WORK.D_OLD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.45 seconds
      cpu time            0.04 seconds
      

Here is the log when I placed the ZIP infile ahead of the non-Zipped Infile:

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         
27         
28         DATA d_old;
29         	LENGTH inputtype $1.
30         		  FN $500.
31         		  FN2 $250.;
32         	INFILE DATALINES DSD delimiter='|';
33         	INPUT inputtype $ FN $ FN2 $;
34         
35         	SELECT (inputtype);
36         		WHEN ('Z') INFILE foo ZIP filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
37         		WHEN ('N') INFILE foo filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
38         		OTHERWISE;
39         		END;
40         
41         
42         	DO while (not done);
43         		INPUT &datain.;
44         		OUTPUT;
45         		END;
46         
47         	DROP inputtype;
48         
49         DATALINES;

ERROR: Invalid record format.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
50         N|XXXXX|2011.
      101  txt
inputtype=N FN=XXXXX FN2=2011.txt done=0
DeathPlaceStateCode=  CertNumber=  DecNameLast=  DecNameFirst=  DecNameMI=  DecNameGenID=  Soundex=  AliasCode=  Sex=  DeathDateMo= 
DeathDateDay=  DeathDateCC=  DeathDateYY=  DecSSN=  DecAgeUnit=  DecAgeNumber=  DecAgeMonths=  DecAgeDays=  DecAgeHours= 
2                                                          The SAS System                               14:45 Friday, March 11, 2022

DecAgeMin=  DecDOBMo=  DecDOBDay=  DecDOBCC=  DecDOBYY=  DecBirthPlaceStateCountryCode=  DecInfCertNumTNBirth= 
DecInfCertNumOOSBirth=  DeathPlaceCode=  DeathPlaceType=  DeathPlaceCityCode=  DeathPlaceCountyCode=  DecMaritalStatus= 
DecResStateCode=  DecRaceSexWNU=  DecRaceSexWBOU=  DecResCity=  DecResAddress=  DecResCityLimitsInd=  DecResZip=  ResStateInd= 
DecResCountyStateCode=  DecResCityCode=  ResCensusTract=  DecHispanicOriginCode=  DecRace=  DecEducationYrs=  DadNameLast= 
DadNameFirst=  MomNameMaidenLast=  MomNameFirst=  FuneralHomeCode=  FuneralHomeCountyAbbrev=  FuneralHome=  FuneralHomeAddInfo= 
RegistrarRecDateMo=  RegistrarRecDay=  RegistrarRecDateCC=  RegistrarRecDateYY=  CertifierType=  CertifierLicenseNum= 
MedExamLicenseNum=  CertifierSignDateMo=  CertifierSignDateDay=  CertifierSignDateCC=  CertifierSignDateYY=  Autopsy= 
AutopsyUsedForUCD=  DeathManner=  WorkInjury=  CODQryCode=  UCD=  UCDGroupCode=  CODEntAxis01=  CODEntAxis02=  CODEntAxis03= 
CODEntAxis04=  CODEntAxis05=  CODEntAxis06=  CODEntAxis07=  CODEntAxis08=  CODEntAxis09=  CODEntAxis10=  CODEntAxis11= 
CODEntAxis12=  CODEntAxis13=  CODEntAxis14=  CODEntAxis15=  CODEntAxis16=  CODEntAxis17=  CODEntAxis18=  CODEntAxis19= 
CODEntAxis20=  CODRecAxis01=  CODRecAxis02=  CODRecAxis03=  CODRecAxis04=  CODRecAxis05=  CODRecAxis06=  CODRecAxis07= 
CODRecAxis08=  CODRecAxis09=  CODRecAxis10=  CODRecAxis11=  CODRecAxis12=  CODRecAxis13=  CODRecAxis14=  CODRecAxis15= 
CODRecAxis16=  CODRecAxis17=  CODRecAxis18=  CODRecAxis19=  CODRecAxis20=  SystemDateMo=  SystemDateDay=  SystemDateYY= 
DecInArmedForces=  _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.D_OLD may be incomplete.  When this step was stopped there were 0 observations and 116 variables.
WARNING: Data set WORK.D_OLD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.28 seconds
      cpu time            0.03 seconds

Here it is when I place the ZIP files ahead of the non-zipped datalines.

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         
27         DATA d_old;
28         	LENGTH inputtype $1.
29         		  FN $500.
30         		  FN2 $250.;
31         	INFILE DATALINES DSD delimiter='|';
32         	INPUT inputtype $ FN $ FN2 $;
33         
34         	SELECT (inputtype);
35         		WHEN ('N') INFILE foo filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
36         		WHEN ('Z') INFILE foo ZIP filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
37         		OTHERWISE;
38         		END;
39         
40         
41         	DO while (not done);
42         		INPUT &datain.;
43         		OUTPUT;
44         		END;
45         
46         	DROP inputtype;
47         
48         DATALINES;

NOTE: The infile library FOO is:
      Directory=N:\XXXXX\2000_2009.zip

NOTE: The infile FOO(2009.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2009.txt

NOTE: The infile FOO(2008.txt) is:
2                                                          The SAS System                               14:45 Friday, March 11, 2022

      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2008.txt

NOTE: The infile FOO(2007.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2007.txt

NOTE: The infile FOO(2006.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2006.txt

NOTE: The infile FOO(2005.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2005.txt

NOTE: The infile FOO(2004.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2004.txt

NOTE: The infile FOO(2003.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2003.txt

NOTE: The infile FOO(2002.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2002.txt

NOTE: The infile FOO(2001.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2001.txt

NOTE: The infile FOO(2000.txt) is:
      Filename=N:\XXXXX\2000_2009.zip,
      Member Name=2000.txt

ERROR: Invalid record format.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
59         N|XXXX|2011.
      101  txt
inputtype=N FN=XXXX FN2=2011.txt done=1
DeathPlaceStateCode=  CertNumber=  DecNameLast=  DecNameFirst=  DecNameMI=  DecNameGenID=  Soundex=  AliasCode=  Sex=  DeathDateMo= 
DeathDateDay=  DeathDateCC=  DeathDateYY=  DecSSN=  DecAgeUnit=  DecAgeNumber=  DecAgeMonths=  DecAgeDays=  DecAgeHours= 
DecAgeMin=  DecDOBMo=  DecDOBDay=  DecDOBCC=  DecDOBYY=  DecBirthPlaceStateCountryCode=  DecInfCertNumTNBirth= 
DecInfCertNumOOSBirth=  DeathPlaceCode=  DeathPlaceType=  DeathPlaceCityCode=  DeathPlaceCountyCode=  DecMaritalStatus= 
DecResStateCode=  DecRaceSexWNU=  DecRaceSexWBOU=  DecResCity=  DecResAddress=  DecResCityLimitsInd=  DecResZip=  ResStateInd= 
DecResCountyStateCode=  DecResCityCode=  ResCensusTract=  DecHispanicOriginCode=  DecRace=  DecEducationYrs=  DadNameLast= 
DadNameFirst=  MomNameMaidenLast=  MomNameFirst=  FuneralHomeCode=  FuneralHomeCountyAbbrev=  FuneralHome=  FuneralHomeAddInfo= 
RegistrarRecDateMo=  RegistrarRecDay=  RegistrarRecDateCC=  RegistrarRecDateYY=  CertifierType=  CertifierLicenseNum= 
MedExamLicenseNum=  CertifierSignDateMo=  CertifierSignDateDay=  CertifierSignDateCC=  CertifierSignDateYY=  Autopsy= 
AutopsyUsedForUCD=  DeathManner=  WorkInjury=  CODQryCode=  UCD=  UCDGroupCode=  CODEntAxis01=  CODEntAxis02=  CODEntAxis03= 
CODEntAxis04=  CODEntAxis05=  CODEntAxis06=  CODEntAxis07=  CODEntAxis08=  CODEntAxis09=  CODEntAxis10=  CODEntAxis11= 
CODEntAxis12=  CODEntAxis13=  CODEntAxis14=  CODEntAxis15=  CODEntAxis16=  CODEntAxis17=  CODEntAxis18=  CODEntAxis19= 
CODEntAxis20=  CODRecAxis01=  CODRecAxis02=  CODRecAxis03=  CODRecAxis04=  CODRecAxis05=  CODRecAxis06=  CODRecAxis07= 
CODRecAxis08=  CODRecAxis09=  CODRecAxis10=  CODRecAxis11=  CODRecAxis12=  CODRecAxis13=  CODRecAxis14=  CODRecAxis15= 
CODRecAxis16=  CODRecAxis17=  CODRecAxis18=  CODRecAxis19=  CODRecAxis20=  SystemDateMo=  SystemDateDay=  SystemDateYY= 
DecInArmedForces=  _ERROR_=1 _N_=11
NOTE: A total of 609274 records were read from the infile library FOO.
      The minimum record length was 601.
3                                                          The SAS System                               14:45 Friday, March 11, 2022

      The maximum record length was 602.
NOTE: 62147 records were read from the infile FOO(2009.txt).
      The minimum record length was 602.
      The maximum record length was 602.
NOTE: 62729 records were read from the infile FOO(2008.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 61067 records were read from the infile FOO(2007.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 60820 records were read from the infile FOO(2006.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 61422 records were read from the infile FOO(2005.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 59796 records were read from the infile FOO(2004.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 61421 records were read from the infile FOO(2003.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 60851 records were read from the infile FOO(2002.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 59459 records were read from the infile FOO(2001.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: 59562 records were read from the infile FOO(2000.txt).
      The minimum record length was 601.
      The maximum record length was 601.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.D_OLD may be incomplete.  When this step was stopped there were 609274 observations and 116 variables.
WARNING: Data set WORK.D_OLD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           8.57 seconds
      cpu time            3.92 seconds


As side note &datain. is a macro variable detailing the 116 variables, it was been well tested to conform to the formatting of all years of the data listed here.

BenTN
Fluorite | Level 6

Oops missed one.
Here is the log when I switched the names of the INFILE placeholder file reference

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         
27         DATA d_old;
28         	LENGTH inputtype $1.
29         		  FN $500.
30         		  FN2 $250.;
31         	INFILE DATALINES DSD delimiter='|';
32         	INPUT inputtype $ FN $ FN2 $;
33         
34         	SELECT (inputtype);
35         		WHEN ('N') INFILE foo filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
36         		WHEN ('Z') INFILE bar ZIP filevar=fn memvar=fn2 LRECL=603 end=done TRUNCOVER;
37         		OTHERWISE;
38         		END;
39         
40         
41         	DO while (not done);
42         		INPUT &datain.;
43         		OUTPUT;
44         		END;
45         
46         	DROP inputtype;
47         
48         DATALINES;

ERROR: Invalid record format.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                     
49         N|XXXX|2011.
      101  txt
inputtype=N FN=XXXX FN2=2011.txt done=0
DeathPlaceStateCode=  CertNumber=  DecNameLast=  DecNameFirst=  DecNameMI=  DecNameGenID=  Soundex=  AliasCode=  Sex=  DeathDateMo= 
DeathDateDay=  DeathDateCC=  DeathDateYY=  DecSSN=  DecAgeUnit=  DecAgeNumber=  DecAgeMonths=  DecAgeDays=  DecAgeHours= 
DecAgeMin=  DecDOBMo=  DecDOBDay=  DecDOBCC=  DecDOBYY=  DecBirthPlaceStateCountryCode=  DecInfCertNumTNBirth= 
2                                                          The SAS System                               14:45 Friday, March 11, 2022

DecInfCertNumOOSBirth=  DeathPlaceCode=  DeathPlaceType=  DeathPlaceCityCode=  DeathPlaceCountyCode=  DecMaritalStatus= 
DecResStateCode=  DecRaceSexWNU=  DecRaceSexWBOU=  DecResCity=  DecResAddress=  DecResCityLimitsInd=  DecResZip=  ResStateInd= 
DecResCountyStateCode=  DecResCityCode=  ResCensusTract=  DecHispanicOriginCode=  DecRace=  DecEducationYrs=  DadNameLast= 
DadNameFirst=  MomNameMaidenLast=  MomNameFirst=  FuneralHomeCode=  FuneralHomeCountyAbbrev=  FuneralHome=  FuneralHomeAddInfo= 
RegistrarRecDateMo=  RegistrarRecDay=  RegistrarRecDateCC=  RegistrarRecDateYY=  CertifierType=  CertifierLicenseNum= 
MedExamLicenseNum=  CertifierSignDateMo=  CertifierSignDateDay=  CertifierSignDateCC=  CertifierSignDateYY=  Autopsy= 
AutopsyUsedForUCD=  DeathManner=  WorkInjury=  CODQryCode=  UCD=  UCDGroupCode=  CODEntAxis01=  CODEntAxis02=  CODEntAxis03= 
CODEntAxis04=  CODEntAxis05=  CODEntAxis06=  CODEntAxis07=  CODEntAxis08=  CODEntAxis09=  CODEntAxis10=  CODEntAxis11= 
CODEntAxis12=  CODEntAxis13=  CODEntAxis14=  CODEntAxis15=  CODEntAxis16=  CODEntAxis17=  CODEntAxis18=  CODEntAxis19= 
CODEntAxis20=  CODRecAxis01=  CODRecAxis02=  CODRecAxis03=  CODRecAxis04=  CODRecAxis05=  CODRecAxis06=  CODRecAxis07= 
CODRecAxis08=  CODRecAxis09=  CODRecAxis10=  CODRecAxis11=  CODRecAxis12=  CODRecAxis13=  CODRecAxis14=  CODRecAxis15= 
CODRecAxis16=  CODRecAxis17=  CODRecAxis18=  CODRecAxis19=  CODRecAxis20=  SystemDateMo=  SystemDateDay=  SystemDateYY= 
DecInArmedForces=  _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.D_OLD may be incomplete.  When this step was stopped there were 0 observations and 116 variables.
WARNING: Data set WORK.D_OLD was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.14 seconds
      cpu time            0.03 seconds
Tom
Super User Tom
Super User

Check the ZIP files and make sure they do not include BOM (Byte Order Mark ) characters.

data _null_;
  infile "N:\...\dataarchive\2000_2009.zip" zip member="2009.txt" obs=1;
  input;
  list;
run;

The ZIP engine does not recognize the BOM characters and so they become part of the first record.

 

You could conditionally read the first characters of the ZIP file and make sure it is not BOM characters.  If it is NOT the BOM characters then move the pointer back to start of the line.  Also don't bother setting LRECL=500. You don't want to lose the last 3 bytes if the BOM characters are there.  And the current default is 32,767 which should be plenty long enough for 500 byte records.

    WHEN ('Z') do ;
        INFILE ZIP ZIP FILEVAR=fn MEMVAR=fn2 END=done TRUNCOVER;
        if not done then do;
           input BOM $hex3. @ ;
           if bom ne 'EFBBBF'x then input @1 @;
        end;
    end;
BenTN
Fluorite | Level 6

Hi Tom,

 

When I run list in order to check for BOM against all of the relevant files, I'm not visually seeing the value (but given its nature Im not super surprised.)

 

The added code which checks for it seems to work. So presumably that was the issue. Do programs like WinZIP or other typically append this character in zip files?

 

Anyhoo. Thanks.

 

Tom
Super User Tom
Super User

You wouldn't normally see the BOM as the software takes care of noticing it.  You would need some tool that lets you see that detail.  Perhaps Notepad++?

 

It is the software that created the original file that decides whether to prefix the BOM characters. The ZIP tools normally just copy the file from the disk.  So if it contains a BOM then that is copied since it is just part of the file.

 

SAS normally detects the BOM when reading a text file and reacts appropriately.  But for some reason when using the ZIP fileref engine that detection process is by-passed.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 873 views
  • 2 likes
  • 3 in conversation