Hello,
This is my situation:
My code seems to be unzipping the files correctly and the variables are correctly formatted, but every single value is missing. Any ideas?
Here is my code:
filename ZIPFILE pipe '"C:\7z.exe" e
[file path]\20130226_BKV_imps.zip
20130226_BKV_imps.txt
-y -so';
Data work.temp;
format
UniqueID1 BEST9. UniqueID2 BEST10. Tentative BEST1. IPAddress $CHAR13. Time DATETime18.
PlacementID BEST9. AdID BEST9. Click BEST1. AdvertiserID BEST6. AudienceID BEST10.;
infile ZIPFILE dsd dlm=' ' lrecl=300 truncover missover;
input
UniqueID1 : ?? BEST9. UniqueID2 : ?? BEST10. Tentative : ?? BEST1. IPAddress : $CHAR13.
Time : ?? ANYDTDTM23. PlacementID : ?? BEST9. AdID : ?? BEST9. Click : ?? BEST1.
AdvertiserID : ?? BEST6. AudienceID : ?? BEST10. ;
run;
Here is the Log:
NOTE: The infile ZIPFILE is:
Unnamed Pipe Access Device,
PROCESS="C:\7z.exe" e
[file path]\20121225_BKV_imps.zip
20121225_BKV_imps.txt -y -so,
RECFM=V,LRECL=300
Stderr output:
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
Processing archive:
[file path]\20121225_BKV_imps.zip
Extracting 20121225_BKV_imps.txt
Everything is Ok
Size: 245463226
Compressed: 30965060
NOTE: 1323286 records were read from the infile ZIPFILE.
The minimum record length was 2.
The maximum record length was 203.
One or more lines were truncated.
NOTE: The data set WORK.TEMP has 1323286 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 4.64 seconds
cpu time 3.24 seconds
Here is the output:
I am using Enterprise Guide 5.1 with SAS 9.3
Hey Tom,
Thanks very much for your insight on this. I ended up going with X-command to complete this task.
x "cd C:\Program Files\7-Zip\";
x '7z.exe e -o[file path]\TEMP\OUTPUT [file path]\TEMP\20130226_BKV_imps.zip';
Data work.temp;
length UniqueID1 UniqueID2 Tentative 8 IPAddress $13. Time
PlacementID AdID Click AdvertiserID AudienceID 8;
format IPAddress $CHAR13. Time DATETime18.;
informat IPAddress $CHAR13. Time ANYDTDTM23.;
infile '[file path]\TEMP\OUTPUT\20130226_BKV_imps.txt'
dsd dlm='09'x lrecl=300 truncover;
input UniqueID1 -- AudienceID;
run;
x "cd [file path]\TEMP\OUTPUT\";
x "del 20130226_BKV_imps.txt";
A couple of things to try.
1) Specify the delimiter using hexadecimal literal syntax to prevent tab character in your code from being converted to spaces.
DLM='09'x
2) Remove the MISSOVER option as it is overriding the TRUNCOVER option that you want to use because you have listed it last in the INFILE statement.
3) Convert your FORMAT statement to a length statement. Most of the formats listed are of no value and it will be clearer to you and SAS what type/length the variables should be it you state it explicitly using LENGTH (or ATTRIB) statement instead of implying if by the format you attach to the variable.
length UniqueID1 UniqueID2 Tentative 8 IPAddress $13
Time PlacementID AdID Click AdvertiserID AudienceID 8
;
format IPAddress $CHAR13. Time DATETime18. ;
4) Use an INFORMAT statement to tell SAS how to read the data that needs special processing. Simplify the INPUT statement that just list the variables. You could even use variable list since you will have explicitly defined the variable order with the LENGTH statement above.
informat IPAddress $CHAR13. Time DATETime18. ;
input UniqueID1 -- AudienceID ;
Also since you will have removed the ?? input operator you will see in the SAS LOG any values that cause conversion failures that are currently generating the missing values.
Thanks for responding Tom!
I made the edits you suggested; however, I am still getting a dataset full of missing values.
Code:
filename ZIPFILE pipe '"C:\7z.exe" e
[file path]\20121225_BKV_imps.zip
20121225_BKV_imps.txt
-y -so';
Data work.temp;
length UniqueID1 UniqueID2 Tentative 8 IPAddress $13. Time
PlacementID AdID Click AdvertiserID AudienceID 8;
format IPAddress $CHAR13. Time DATETime18.;
informat IPAddress $CHAR13. Time DATETime18.;
infile ZIPFILE dsd dlm='09'x lrecl=300 truncover;
input UniqueID1 -- AudienceID;
run;
Log:
NOTE: The infile ZIPFILE is:
Unnamed Pipe Access Device,
PROCESS="C:\7z.exe" e
[file path]\20121225_BKV_imps.zip
20121225_BKV_imps.txt -y -so,
RECFM=V,LRECL=300
NOTE: Invalid data for UniqueID1 in line 1 1-20.
NOTE: Invalid data for UniqueID2 in line 1 22-42.
NOTE: Invalid data for Tentative in line 1 44-46.
NOTE: Invalid data for Time in line 1 74-120.
NOTE: Invalid data for PlacementID in line 1 122-140.
NOTE: Invalid data for AdID in line 1 142-160.
NOTE: Invalid data for Click in line 1 162-164.
NOTE: Invalid data for AdvertiserID in line 1 166-178.
NOTE: Invalid data for AudienceID in line 1 180-184.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
1 CHAR ÿþ8.1.6.3.6.1.1.7.2...2.1.3.6.9.8.9.9.4.2...0...[.......IP.......]...2.0.1.2.-.1.2.
ZONE FF30303030303030303000303030303030303030300030003030203030302030303020300030303030203030
NUMR FE80106030601010702090201030609080909040209000901020E0200050E0204020E0009020001020D01020
89 -.2.5. .0.4.:.1.8.:.0.3...0.0.0...3.9.7.9.4.1.2.5.1...3.0.1.3.5.3.1.5.0...0...1.6.5.6.5.
ZONE 2030302030303030303030302030303000303030303030303030003030303030303030300030003030303030
NUMR D02050000040A01080A00030E000000090309070904010205010903000103050301050009000901060506050
177 6...0... 184
ZONE 30003000
NUMR 609000D0
UniqueID1=. UniqueID2=. Tentative=. IPAddress= 1 2 . 2 0 5 Time=. PlacementID=. AdID=. Click=.
AdvertiserID=. AudienceID=. _ERROR_=1 _N_=1
[...]
7-Zip [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
Processing archive:
[file path]\20121225_BKV_imps.zip
Extracting 20121225_BKV_imps.txt
Everything is Ok
Size: 245463226
Compressed: 30965060
NOTE: 1323286 records were read from the infile ZIPFILE.
The minimum record length was 2.
The maximum record length was 203.
One or more lines were truncated.
NOTE: The data set WORK.TEMP has 1323286 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 5.23 seconds
cpu time 3.91 seconds
Output:
Looks like your data is encoded as UTF-16 or some other 2 byte coding sequence. Probably if you had expanded into a file and then read from the file SAS or the operating system could have auto detected the file encoding and it would have adjusted that encoding as i read it into the input buffer.
The first two bytes 'FFFE'X is the BOM (Byte order mark - Wikipedia, the free encyclopedia) that indicates the encoding. This indicates that it is UTF-16 (each character is 2 bytes) with the first byte being the least significant.
Knowing this you can now see that the 9 digit UniqueID1 value is stored in 18 bytes with every other byte being binary zero ('00'x).
I do not know anything about 7z.exe, but perhaps the -scc or -scs option can change this and make it output normal characters?
You could also try adding ENCODING option to your INFILE or FILENAME statement.
I get an error message when I try to use the ENCODING option in either INFILE or FILENAME (ERROR 23-2: Invalid option name ENCODING.) and the dataset has 0 observations when I try out the -scc and -scs options.
I had been using the example found on the bottom of page 4 of this paper (http://www2.sas.com/proceedings/sugi31/155-31.pdf) to model my code; but it may be that 7zip is not a good option for this.
Tom, if you were going to import a zipped txt file using SAS, how would you do it?
I use WinZIP on PC and unzip on Unix.
Did you try ENCODING=u16l ?
I did try ENCODING="u16l", but I got errors in both the FILENAME and INFILE statements.
Try 1:
16 filename ZIPFILE pipe '"C:\7z.exe" e
17 [file path]\20121225_BKV_imps.zip
18 20121225_BKV_imps.txt
19 -y -so' encoding="u16l";
________
23
ERROR: Error in the FILENAME statement.
ERROR 23-2: Invalid option name encoding.
Try 2:
21 Data work.temp;
22 length UniqueID1 UniqueID2 Tentative 8 IPAddress $13. Time
23 PlacementID AdID Click AdvertiserID AudienceID 8;
24 format IPAddress $CHAR13. Time DATETime18.;
25 informat IPAddress $CHAR13. Time ANYDTDTM23.;
26 infile ZIPFILE dsd dlm='09'x lrecl=300 truncover encoding="u16l";
________
23
ERROR 23-2: Invalid option name ENCODING.
27 input UniqueID1 -- AudienceID;
28 run;
What version of SAS are you running and what operating system? Encoding option works for me using SAS 9.2 on Unix. Try running the little program below to generate a file with UTF-16LE encoding and try to read it from a pipe. (If you are using Windows then change the command used to dump the file to standard output from cat to appropriate Windows command.) When I run it without the ENCODING option on the INFILE statement I get similar content as you had. When I add ENCODING=U16L most of the data is now readable, but for some reason there is one extra character ('1A'x) in the beginning of the first line and a warning message in the log about a character not be transcoded.
WARNING: A character that could not be transcoded was encountered.
line="Alfred M 14 69 112.5"
RULE: ----+----1----+----2----+----3----+----4
1 CHAR .Alfred M 14 69 112.5 21
ZONE 146676624233233233323
NUMR A1C62540D0140690112E5
line="Alice F 13 56.5 84"
2 Alice F 13 56.5 84 18
%let tmpfile=%sysfunc(pathname(work))/temp.txt;
options generic;
filename xx "&tmpfile";
filename yy pipe "cat &tmpfile";
* Create UTF-16LE file ;
data _null_;
file xx encoding= U16L ;
set sashelp.class (obs=2);
put (_all_) (:) ;
run;
* Read from std input ;
data _null_;
infile yy truncover ;
input line $char80.;
list;
put line= $quote.;
run;
* Add ENCODING option ;
data _null_;
infile yy encoding=U16L truncover;
input line $char80.;
list;
put line= $quote.;
run;
* Skip extra strange character on first line ;
data _null_;
infile yy encoding=U16L truncover ;
if _n_=1 then input xxx $char1. @;
input line $char80.;
list;
put line= $quote.;
run;
filename xx clear;
filename yy clear;
options nogeneric;
So I played around with Windows SAS and apparently it only likes the ENCODING= option on an INFILE statement when referencing an actual physical file, not when using FILENAME or PIPE engine.
You might want to implement a two step process.
1) Extract the file.
filename zipfile temp;
data _null_;
infile %sysfunc(quote(
"C:\7z.exe" e "[file path]\20130226_BKV_imps.zip" 20130226_BKV_imps.txt -y -so > "%sysfunc(pathname(zipfile))"
)) pipe ;
input;
put _infile_;
run;
2) Read it.
....
infile zipfile dlm='09'x dsd truncover ;
....
Hey Tom,
Thanks very much for your insight on this. I ended up going with X-command to complete this task.
x "cd C:\Program Files\7-Zip\";
x '7z.exe e -o[file path]\TEMP\OUTPUT [file path]\TEMP\20130226_BKV_imps.zip';
Data work.temp;
length UniqueID1 UniqueID2 Tentative 8 IPAddress $13. Time
PlacementID AdID Click AdvertiserID AudienceID 8;
format IPAddress $CHAR13. Time DATETime18.;
informat IPAddress $CHAR13. Time ANYDTDTM23.;
infile '[file path]\TEMP\OUTPUT\20130226_BKV_imps.txt'
dsd dlm='09'x lrecl=300 truncover;
input UniqueID1 -- AudienceID;
run;
x "cd [file path]\TEMP\OUTPUT\";
x "del 20130226_BKV_imps.txt";
HI TOM,
I want to download a file from web which is in zip format then import to sas data set. I used your suggestion but my program not produce correct result.
Please help me to resolve the same. My program and log is below mention...
Program...........
FILENAME NIFDATA URL "%STR(http://nseindia.com/content/historical/DERIVATIVES/2014/JAN/fo21JAN2014bhav.csv.zip)" DEBUG lrecl=8192;
data _null_ ;
unzipcmd='"C:\Program Files\7-Zip\7z.exe" -min -e -o ' ;
zipfile= "NIFDATA";
whereto='C:\RT' ;
cmd=unzipcmd || ' ' || zipfile || ' ' || whereto ;
putlog "NOTE-Processing command " cmd ;
call system( cmd ) ;
run ;
PROC IMPORT DATAFILE='C:\RT\fo24JAN2014bhav.csv' OUT=NIFTY DBMS=CSV REPLACE ;
RUN;
LOG......
2083
2084 FILENAME NIFDATA URL "%STR(http://nseindia.com/content/historical/DERIVATIVES/2014/JAN/fo21JAN2014bhav.csv.zip)" DEBUG lrecl=8192;
2085
2086 data _null_ ;
2087 unzipcmd='"C:\Program Files\7-Zip\7z.exe" -min -e -o ' ;
2088 zipfile= "NIFDATA";
2089 whereto='C:\RT' ;
2090 cmd=unzipcmd || ' ' || zipfile || ' ' || whereto ;
2091 putlog "NOTE-Processing command " cmd ;
2092 call system( cmd ) ;
2093 run ;
Processing command "C:\Program Files\7-Zip\7z.exe" -min -e -o NIFDATA C:\RT
NOTE: DATA statement used (Total process time):
real time 0.32 seconds
cpu time 0.07 seconds
2094
2095
2096 PROC IMPORT DATAFILE='c:\rt\fo24JAN2014bhav.csv' OUT=NIFTY11 DBMS=csv REPLACE ;
2097
2098 RUN;
ERROR: Physical file does not exist, c:\rt\fo24JAN2014bhav.csv.
NOTE: Import Cancelled.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.15 seconds
cpu time 0.15 seconds
Thanks..
Ravinder Kumar
7z.exe is not going to be able to see a fileref that you have defined in SAS.
You will need to add a step to copy the file from the URL to a physical file on your computer that 7z.exe can then read.
Hi Tom,
I'm also trying to download a zip file from the web and then read it into EG. The download works but the file size is slightly smaller than what is listed on the website. There are 2 text files in the download.
This is the code I'm using to download the file.
options mlogic sslcalistloc="/ma/scott/fhlmc/fhlmc.cert"; /* location of certificate file */
filename fhlmc url "https://freddiemac.embs.com/FLoan/Data/historical_data1_Q11999.zip" debug;
data _null_;
infile fhlmc recfm=n;
input;
file "/ma/scott/fhlmc/data/historical_data1_Q11999.zip";
put _infile_;
run;
I get an error if I manually try to unzip the file and when I try to read it directly I get this error.
SYMBOLGEN: Macro variable LIB resolves to /ma/scott/fhlmc/data/
SYMBOLGEN: Macro variable AS_OF_QTR_YEAR resolves to Q11999
MPRINT(ORIG): FileName zip Saszipam "/ma/scott/fhlmc/data/historical_data1_Q11999.zip";
MPRINT(ORIG): data origfile;
SYMBOLGEN: Macro variable AS_OF_QTR_YEAR resolves to Q11999
MPRINT(ORIG): infile zip(historical_data1_Q11999.txt) dsd dlm='|' missover firstobs=2 lrecl=32767;
MPRINT(ORIG): input fico : 8. dt_first_pi : 8. flag_fthb : $1. dt_matr : 8. cd_msa : 8. mi_pct : 8. cnt_units : 8. occpy_sts :
$1. cltv : 8. dti : 8. orig_upb : 8. ltv : 8. int_rt : 8. channel : $1. ppmt_pnlty : $1. prod_type : $5. st : $2. prop_type : $2.
zipcode : 8. id_loan : $16. loan_purpose : $5. orig_loan_term : 8. cnt_borr : $2. seller_name : $30. servicer_name : $30. ;
MPRINT(ORIG): run;
2 The SAS System 09:07 Monday, June 16, 2014
ERROR: The central directory is invalid in /ma/scott/fhlmc/data/historical_data1_Q11999.zip.
WARNING: End of file.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.ORIGFILE may be incomplete. When this step was stopped there were 0 observations and 25 variables.
WARNING: Data set WORK.ORIGFILE was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.