DATA Step, Macro, Functions and more

While importing text it read only few rows

Reply
Contributor
Posts: 35

While importing text it read only few rows

There is one text file in zip of about 16GB of size. tEXT FILE HAVE ABOUT 65 Millions of recoreds but it only import 6 Millions with below code and with proc Import statement alone. can suggestions please?

 

filename inzip ZIP "/product/users/chtrrv01/UNIVERSE1.TXT.zip";

/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
length memname $200 isFolder 8;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
/* check for trailing / in folder name */
isFolder = (first(reverse(trim(memname)))='/');
output;
end;
rc=dclose(fid);
run;

/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

 

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/KDKNW.SJ32CUIE.UNIVERSE1.TXT" ;

/* hat tip: "data _null_" on SAS-L */
data _null_;
/* using member syntax here */
infile inzip(KDKNW.SJ32CUIE.UNIVERSE1.TXT)
dlm='|' lrecl=32767 recfm=F length=length eof=eof unbuf;
file xl lrecl=32767 recfm=N;
input;
put _infile_ $varying32767. length;
return;
eof:
stop;
run;

proc import datafile=xl dbms=dlm out=account replace;
delimiter='|';

/*sheet=confirmed*/;
run;

Trusted Advisor
Posts: 1,584

Re: While importing text it read only few rows

[ Edited ]
Posted in reply to Yarlanil59

Please post the full log. There may be a hint.

Please add - what SAs platform do you use?

Super User
Super User
Posts: 7,076

Re: While importing text it read only few rows

Posted in reply to Yarlanil59

Do you really have a text file with fixed length records of 32,767 columns?

Contributor
Posts: 35

Re: While importing text it read only few rows

Thanks for your respond. Here is log

 

1 The SAS System 16:08 Tuesday, May 9, 2017

1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5 %LET _CLIENTPROJECTPATH='';
6 %LET _CLIENTPROJECTPATHHOST='';
7 %LET _CLIENTPROJECTNAME='';
8 %LET _SASPROGRAMFILE='';
9 %LET _SASPROGRAMFILEHOST='';
10
11 ODS _ALL_ CLOSE;
12 OPTIONS DEV=ACTIVEX;
13 GOPTIONS XPIXELS=0 YPIXELS=0;
14 FILENAME EGSR TEMP;
15 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16 STYLE=HtmlBlue
17 STYLESHEET=(URL="file:///D:/SAS/SASHome/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
18 NOGTITLE
19 NOGFOOTNOTE
20 GPATH=&sasworklocation
21 ENCODING=UTF8
22 options(rolap="on")
23 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24
25 GOPTIONS ACCESSIBLE;
26 filename inzip ZIP "/product/users/chtrrv01/UNIVERSE1.TXT.zip";
27
28 /* Read the "members" (files) from the ZIP file */
29 data contents(keep=memname isFolder);
30 length memname $200 isFolder 8;
31 fid=dopen("inzip");
32 if fid=0 then
33 stop;
34 memcount=dnum(fid);
35 do i=1 to memcount;
36 memname=dread(fid,i);
37 /* check for trailing / in folder name */
38 isFolder = (first(reverse(trim(memname)))='/');
39 output;
40 end;
41 rc=dclose(fid);
42 run;

NOTE: The data set WORK.CONTENTS has 1 observations and 2 variables.
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:08:24,637-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 1| _DISARM| 20086784| _DISARM| 12| _DISARM| 12| _DISARM| 752| _DISARM| 47848| _DISARM| 0.000000| _DISARM|
0.009089| _DISARM| 1809979704.628560| _DISARM| 1809979704.637649| _DISARM| 0.000000| _DISARM| | _ENDDISARM
NOTE: PROCEDURE| _DISARM| STOP| _DISARM| 2017-05-09T16:08:24,638-04:00| _DISARM| WorkspaceServer| _DISARM| SAS| _DISARM| |
_DISARM| 20086784| _DISARM| 20086784| _DISARM| 12| _DISARM| 12| _DISARM| 1648| _DISARM| 47848| _DISARM| 0.000000| _DISARM|
0.037251| _DISARM| 1809979704.600998| _DISARM| 1809979704.638249| _DISARM| 0.000000| _DISARM| | _ENDDISARM
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

43
44 /* create a report of the ZIP contents */
2 The SAS System 16:08 Tuesday, May 9, 2017

45 title "Files in the ZIP file";
46 proc print data=contents noobs N;
47 run;

NOTE: There were 1 observations read from the data set WORK.CONTENTS.
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:08:24,714-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 1| _DISARM| 20086784| _DISARM| 12| _DISARM| 12| _DISARM| 1640| _DISARM| 50360| _DISARM| 0.010000|
_DISARM| 0.041559| _DISARM| 1809979704.672629| _DISARM| 1809979704.714188| _DISARM| 0.010000| _DISARM| | _ENDDISARM
NOTE: PROCEDURE| _DISARM| STOP| _DISARM| 2017-05-09T16:08:24,715-04:00| _DISARM| WorkspaceServer| _DISARM| SAS| _DISARM| |
_DISARM| 20086784| _DISARM| 20086784| _DISARM| 12| _DISARM| 12| _DISARM| 2128| _DISARM| 50368| _DISARM| 0.010000| _DISARM|
0.061399| _DISARM| 1809979704.653799| _DISARM| 1809979704.715198| _DISARM| 0.010000| _DISARM| | _ENDDISARM
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.06 seconds
cpu time 0.01 seconds

48
49
50
51 /* identify a temp folder in the WORK directory */
52 filename xl "%sysfunc(getoption(work))/CHARTERC.SJ32CUIE.UNIVERSE1.TXT" ;
53
54 /* hat tip: "data _null_" on SAS-L */
55 data _null_;
56 /* using member syntax here */
57 infile inzip(CHARTERC.SJ32CUIE.UNIVERSE1.TXT)
58 dlm='|' lrecl=256 recfm=F length=length eof=eof unbuf;
59 file xl lrecl=256 recfm=N;
60 input;
61 put _infile_ $varying256. length;
62 return;
63 eof:
64 stop;
65 run;

NOTE: The infile library INZIP is:
Directory=/product/users/chtrrv01/UNIVERSE1.TXT.zip

NOTE: The infile INZIP(CHARTERC.SJ32CUIE.UNIVERSE1.TXT) is:
Filename=/product/users/chtrrv01/UNIVERSE1.TXT.zip,
Member Name=CHARTERC.SJ32CUIE.UNIVERSE1.TXT

NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
Filename=/product/saswork/SAS_workF0A00000B57E_saspapp07/SAS_work616B0000B57E_saspapp07/CHARTERC.SJ32CUIE.UNIVERSE1.TXT,
Owner Name=v656505,Group Name=twcother,
Access Permission=-rw-r--r--,
Last Modified=09May2017:16:08:24

NOTE: A total of 65449514 records were read from the infile library INZIP.
NOTE: 65449514 records were read from the infile INZIP(CHARTERC.SJ32CUIE.UNIVERSE1.TXT).
NOTE: PROCEDURE| _DISARM| STOP| _DISARM| 2017-05-09T16:11:31,998-04:00| _DISARM| WorkspaceServer| _DISARM| SAS| _DISARM| |
_DISARM| 20348928| _DISARM| 20348928| _DISARM| 12| _DISARM| 12| _DISARM| 40610576| _DISARM| 40660944| _DISARM| 166.120000|
_DISARM| 187.279549| _DISARM| 1809979704.718782| _DISARM| 1809979891.998331| _DISARM| 136.740000| _DISARM| | _ENDDISARM
NOTE: DATA statement used (Total process time):
real time 3:07.28
cpu time 2:46.12

3 The SAS System 16:08 Tuesday, May 9, 2017


66
67 proc import datafile=xl dbms=dlm out=account replace;
68 delimiter='|';
69
70 /*sheet=confirmed*/;
71 run;



NOTE: 5999999 records were read from the infile XL.
The minimum record length was 975.
The maximum record length was 3422.
NOTE: The data set WORK.ACCOUNT has 5999999 observations and 456 variables.
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,143-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 5999999| _DISARM| 43982848| _DISARM| 12| _DISARM| 12| _DISARM| 73146200| _DISARM| 113830816| _DISARM|
371.880000| _DISARM| 441.493721| _DISARM| 1809979898.649956| _DISARM| 1809980340.143677| _DISARM| 324.880000| _DISARM| |
_ENDDISARM
NOTE: PROCEDURE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,144-04:00| _DISARM| WorkspaceServer| _DISARM| SAS| _DISARM| |
_DISARM| 44249088| _DISARM| 43982848| _DISARM| 12| _DISARM| 12| _DISARM| 73146712| _DISARM| 113830824| _DISARM| 372.050000|
_DISARM| 441.679083| _DISARM| 1809979898.465673| _DISARM| 1809980340.144756| _DISARM| 325.000000| _DISARM| | _ENDDISARM
NOTE: DATA statement used (Total process time):
real time 7:21.67
cpu time 6:12.06

Errors detected in submitted DATA step. Examine log.
5999999 rows created in WORK.ACCOUNT from XL.



NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,249-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 426| _DISARM| 40841216| _DISARM| 12| _DISARM| 12| _DISARM| 73157576| _DISARM| 113837576| _DISARM|
68 The SAS System 16:08 Tuesday, May 9, 2017

377.060000| _DISARM| 447.926786| _DISARM| 1809979892.322683| _DISARM| 1809980340.249469| _DISARM| 329.910000| _DISARM| |
_ENDDISARM
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,251-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 1470| _DISARM| 40841216| _DISARM| 12| _DISARM| 12| _DISARM| 73157568| _DISARM| 113837576| _DISARM|
377.060000| _DISARM| 447.927057| _DISARM| 1809979892.324151| _DISARM| 1809980340.251208| _DISARM| 329.910000| _DISARM| |
_ENDDISARM
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,300-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 426| _DISARM| 40841216| _DISARM| 12| _DISARM| 12| _DISARM| 73168824| _DISARM| 113840536| _DISARM|
377.100000| _DISARM| 448.107873| _DISARM| 1809979892.192304| _DISARM| 1809980340.300177| _DISARM| 329.930000| _DISARM| |
_ENDDISARM
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,301-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 1470| _DISARM| 40841216| _DISARM| 12| _DISARM| 12| _DISARM| 73167848| _DISARM| 113840536| _DISARM|
377.100000| _DISARM| 448.096283| _DISARM| 1809979892.205269| _DISARM| 1809980340.301552| _DISARM| 329.930000| _DISARM| |
_ENDDISARM
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,302-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 794| _DISARM| 40841216| _DISARM| 12| _DISARM| 12| _DISARM| 73159856| _DISARM| 113840536| _DISARM|
377.070000| _DISARM| 447.960349| _DISARM| 1809979892.342470| _DISARM| 1809980340.302819| _DISARM| 329.920000| _DISARM| |
_ENDDISARM
NOTE: MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,314-04:00| _DISARM| WorkspaceServer| _DISARM| SAS|
_DISARM| | _DISARM| 5999999| _DISARM| 40308736| _DISARM| 12| _DISARM| 12| _DISARM| 5456| _DISARM| 113841568| _DISARM|
0.030000| _DISARM| 0.087946| _DISARM| 1809980340.226293| _DISARM| 1809980340.314239| _DISARM| 0.020000| _DISARM| | _ENDDISARM
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE| _DISARM| STOP| _DISARM| 2017-05-09T16:19:00,332-04:00| _DISARM| WorkspaceServer| _DISARM| SAS| _DISARM| |
_DISARM| 44769280| _DISARM| 34508800| _DISARM| 12| _DISARM| 12| _DISARM| 73181760| _DISARM| 113842896| _DISARM| 377.150000|
_DISARM| 448.327131| _DISARM| 1809979892.005814| _DISARM| 1809980340.332945| _DISARM| 329.960000| _DISARM| | _ENDDISARM
NOTE: PROCEDURE IMPORT used (Total process time):
real time 7:28.32
cpu time 6:17.16

1488
1489 GOPTIONS NOACCESSIBLE;
1490 %LET _CLIENTTASKLABEL=;
1491 %LET _CLIENTPROCESSFLOWNAME=;
1492 %LET _CLIENTPROJECTPATH=;
1493 %LET _CLIENTPROJECTPATHHOST=;
1494 %LET _CLIENTPROJECTNAME=;
1495 %LET _SASPROGRAMFILE=;
1496 %LET _SASPROGRAMFILEHOST=;
1497
1498 ;*';*";*/;quit;run;
1499 ODS _ALL_ CLOSE;
1500
1501
1502 QUIT; RUN;
1503

Trusted Advisor
Posts: 1,584

Re: While importing text it read only few rows

Posted in reply to Yarlanil59

Taken from your log:

55 data _null_;
56 /* using member syntax here */
57 infile inzip(CHARTERC.SJ32CUIE.UNIVERSE1.TXT)
58 dlm='|' lrecl=256 recfm=F length=length eof=eof unbuf;
59 file xl lrecl=256 recfm=N;
60 input;
61 put _infile_ $varying256. length;
62 return;
63 eof:
64 stop;
65 run;
NOTE: The infile library INZIP is:
Directory=/product/users/chtrrv01/UNIVERSE1.TXT.zip
NOTE: The infile INZIP(CHARTERC.SJ32CUIE.UNIVERSE1.TXT) is:
Filename=/product/users/chtrrv01/UNIVERSE1.TXT.zip,
Member Name=CHARTERC.SJ32CUIE.UNIVERSE1.TXT
NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
Filename=/product/saswork/SAS_workF0A00000B57E_saspapp07/SAS_work616B0000B57E_saspapp07/CHARTERC.SJ32CUIE.UNIVERSE1.TXT,
Owner Name=v656505,Group Name=twcother,
Access Permission=-rw-r--r--,
Last Modified=09May2017:16:08:24
NOTE: A total of 65449514 records were read from the infile library INZIP.
NOTE: 65449514 records were read from the infile INZIP(CHARTERC.SJ32CUIE.UNIVERSE1.TXT).

You have read :  65,449,514 recodrs from the zip file to create the xl file.

 

Later you have:

67 proc import datafile=xl dbms=dlm out=account replace;
68 delimiter='|';
69
70 /*sheet=confirmed*/;
71 run;


NOTE: 5999999 records were read from the infile XL.
The minimum record length was 975.
The maximum record length was 3422.
NOTE: The data set WORK.ACCOUNT has 5999999 observations and 456 variables.

 

ending with:

Errors detected in submitted DATA step. Examine log.
5999999 rows created in WORK.ACCOUNT from XL.

...

ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.

 

From your log:

NOTE: PROCEDURE| _DISARM| STOP| _DISARM| 2017-05-09T16:11:31,998-04:00| _DISARM| WorkspaceServer| _DISARM| SAS| _DISARM| |
_DISARM| 20348928| _DISARM| 20348928| _DISARM| 12| _DISARM| 12| _DISARM| 40610576| _DISARM| 40660944| _DISARM| 166.120000|
_DISARM| 187.279549| _DISARM| 1809979704.718782| _DISARM| 1809979891.998331| _DISARM| 136.740000| _DISARM| | _ENDDISARM

 

Are there 2 consequtive delimiters ? Should you definne option dsd reading the file ?

Try change the line by adding this option:

 infile inzip(CHARTERC.SJ32CUIE.UNIVERSE1.TXT)
     dlm='|'  dsd lrecl=256 recfm=F length=length eof=eof unbuf;
     file xl lrecl=256 recfm=N;

 

Contributor
Posts: 35

Re: While importing text it read only few rows

Thanks for the reply.

 

I tried with 

infile inzip(CHARTERC.SJ32CUIE.UNIVERSE1.TXT) dlm='|' dsd lrecl=256 recfm=F length=lengtheof=eof unbuf; file xl lrecl=256 recfm=N;

 

but still same error.

 

Errors detected in submitted DATA step. Examine log.
5999999 rows created in WORK.ACCOUNT from XL.

 

ERROR: Import unsuccessful.  See SAS Log for details.

Super User
Posts: 10,044

Re: While importing text it read only few rows

Posted in reply to Yarlanil59

Try another option.

 

infile inzip(KDKNW.SJ32CUIE.UNIVERSE1.TXT) ignoredoseof;
Contributor
Posts: 35

Re: While importing text it read only few rows

Thank you @Ksharp for suggestion 

I tried with 

infile inzip(KDKNW.SJ32CUIE.UNIVERSE1.TXT) ignoredoseof;

But same error as before. 

Super User
Super User
Posts: 7,076

Re: While importing text it read only few rows

Posted in reply to Yarlanil59

Can you provide more information?

What operating system is SAS running on?  

Can you use PIPE to run an external ZIP processing program so that you can avoid having to the use ZIP engine from SAS? 

 

It looks like you are trying to do a binary copy from the file in the ZIP file to a text file and so you had RECFM=F, but then you also added DLM='|' which should not apply to a binary file.  Why not just read the file as a text file to begin with?

filename xl "%sysfunc(getoption(work))/KDKNW.SJ32CUIE.UNIVERSE1.TXT" ;

data _null_;
  infile inzip(KDKNW.SJ32CUIE.UNIVERSE1.TXT) lrecl=32767 ;
  file xl lrecl=32767;
  input;
  put _infile_;
run;

Why are you using PROC IMPORT to read a delimited file?  You can write a better program if you know what is in the file. PROC IMPORT can only guess at how to read the file.

 

If your file really is a ZIP file with just one text file in it then what I normally do when running on Unix is to let the unzip command just pull the file to the standard output and read it from there.

data want ;
  infile 'unzip -p myfile.zip' pipe firstobs=2 dsd dlm='|' truncover ;
  length var1-var20 $200 ;
  input var1-var20;
run;
Ask a Question
Discussion stats
  • 8 replies
  • 172 views
  • 0 likes
  • 4 in conversation