I'm using SAS EG and trying to import a pipe separated file using the INFILE statement as follows:
INFILE "&FILE_DIRECTORY."
LRECL=32767
ENCODING="WLATIN1"
TERMSTR=LF
DLM='|'
MISSOVER
FIRSTOBS=2
DSD ;
However, the original file has 3M rows but only 1M are imported. I have tried adding the IGNOREDOSEOF option but this isn't recognized by SAS and doesn't solve the problem. It is also worth mentioning that the row where the table "breaks" has "nothing special".
Any sugestions?
I include the LOG below as suggested by @ballardw
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROCESSFLOWNAME='Program';
5 %LET _CLIENTPROJECTPATH='XXXX';
6 %LET _CLIENTPROJECTPATHHOST='XXXXX';
7 %LET _CLIENTPROJECTNAME='SAS.egp';
8 %LET _SASPROGRAMFILE='';
9 %LET _SASPROGRAMFILEHOST='';
10
11 ODS _ALL_ CLOSE;
12 OPTIONS DEV=SVG;
13 GOPTIONS XPIXELS=0 YPIXELS=0;
14 %macro HTML5AccessibleGraphSupported;
15 %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16 %mend;
17 FILENAME EGHTML TEMP;
18 ODS HTML5(ID=EGHTML) FILE=EGHTML
19 OPTIONS(BITMAP_MODE='INLINE')
20 %HTML5AccessibleGraphSupported
21 ENCODING='utf-8'
22 STYLE=HTMLBlue
23 NOGTITLE
24 NOGFOOTNOTE
25 GPATH=&sasworklocation
26 ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28
29
30 %macro load_input_files_v2 ();
31
32 DATA WORK.table;
33 LENGTH
34 XXXX 8
35 XXXX $ 3
36 XXXX 8;
42 FORMAT
43 XXXX YYMMDD10.
44 XXXX $CHAR3.
45 XXXX BEST5.;
51 INFORMAT
52 XXXX YYMMDD10.
53 XXXX $CHAR3.
54 XXXX BEST5. ;
60 INFILE "&FILE_DIRECTORY."
61 LRECL=32767
62 ENCODING="WLATIN1"
63 TERMSTR=LF
64 DLM='|'
65 MISSOVER
66 FIRSTOBS=2
67 DSD ;
68 INPUT
69 XXXX : ?? YYMMDD8.
70 XXXX : $CHAR3.
71 XXXX : ?? BEST5.
72 XXXX : ?? BEST21.;
77 RUN;
83
84 %mend;
85
86 %load_input_files_v2();
NOTE: The infile "&FILE_DIRECTORY." is:
Filename=YYYY.TXT,
Owner Name=sastoken,Group Name=sasdata,
Access Permission=-rw-rw-r--,
Last Modified=16Aug2023:17:18:13,
File Size (bytes)=66746379
NOTE: 1276289 records were read from the infile "&FILE_DIRECTORY."
The minimum record length was 15.
The maximum record length was 60.
NOTE: The data set WORK.TABLE has 1276289 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 1.08 seconds
cpu time 1.08 seconds
87
88 %LET _CLIENTTASKLABEL=;
89 %LET _CLIENTPROCESSFLOWNAME=;
90 %LET _CLIENTPROJECTPATH=;
91 %LET _CLIENTPROJECTPATHHOST=;
92 %LET _CLIENTPROJECTNAME=;
93 %LET _SASPROGRAMFILE=;
94 %LET _SASPROGRAMFILEHOST=;
95
96 ;*';*";*/;quit;run;
97 ODS _ALL_ CLOSE;
3 The SAS System 12:25 Wednesday, September 6, 2023
98
99
100 QUIT; RUN;
@Babado wrote:
... Moreover, before I import it, it has 200MB
Not true. From your own log, as posted in the initial post:
NOTE: The infile "&FILE_DIRECTORY." is: Filename=YYYY.TXT, Owner Name=sastoken,Group Name=sasdata, Access Permission=-rw-rw-r--, Last Modified=16Aug2023:17:18:13, File Size (bytes)=66746379
So you can see that the file available to SAS is only ~60M in size, which corresponds with the later dataset size.
I guess you had a problem uploading the file to the SAS server.
What does the LOG show? Copy the text from the log with the code submitted and all note, messages or warnings, on the forum open a text box and paste the text.
A data step reading a text file also will include a note about the file name read and some characteristics. Include that as well.
A single line from a data step program is insufficient to answer almost any question about why a number of records is/isn't in the data set. You may have IF or OUTPUT statements that would reduce what is written to the output data set. Your INPUT statement might contain syntax that reads multiple lines of the source file into a single observation or one line of file into multiple observations.
And those are very simple things that may have an impact on then number of observations without even looking at the contents of the source file.
How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.
@ballardw wrote:
How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.
That seems backwards. If the actual lines end with CRLF and you tell SAS to only look for LF then the number of lines could not be less. It could only be the same or more. It could detect more lines if some of the lines contained LF characters in the value of one of the character fields.
The effect of using TERMSTR=LF for file that is using CRLF to mark the end of the lines is that the CR at the end of the line will become the last character on the line. So it might cause trouble for the INPUT statement.
I have tried to use TERMSTR=CRLF and the output is null. Also removing the option TERMSTR=LF outputs the same result (same number of lines).
@Tom wrote:
@ballardw wrote:
How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.
That seems backwards. If the actual lines end with CRLF and you tell SAS to only look for LF then the number of lines could not be less. It could only be the same or more. It could detect more lines if some of the lines contained LF characters in the value of one of the character fields.
The effect of using TERMSTR=LF for file that is using CRLF to mark the end of the lines is that the CR at the end of the line will become the last character on the line. So it might cause trouble for the INPUT statement.
It's been a long time but I had a "file" that was actually a pipe that skipped lines because of an end of line issue and how the system treated it going through the pipe. Long shot but the OP has not actually defined the source well. The macro variable could hold all sorts of interesting stuff...
Have you tried using the DATA step debugger in EG to see what happens near the record where processing stops?
If you are sure the file should have more than 1,276,289 lines then perhaps it has a CONTROL-Z embedded in it that is causing SAS to stop reading at that point. Try adding the IGNOREDOSEOF option to the INFILE statement.
Example:
1399 filename xx temp; 1400 data _null_; 1401 file xx; 1402 put 'line one' 1403 / 'line two' 1404 / 'line three' '1A'x 1405 / 'line four' 1406 ; 1407 run; NOTE: The file XX is: (system-specific pathname), (system-specific file attributes) NOTE: 4 records were written to the file (system-specific pathname). The minimum record length was 8. The maximum record length was 11. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds 1408 1409 data _null_; 1410 infile xx; 1411 input; 1412 list; 1413 run; NOTE: The infile XX is: (system-specific pathname), (system-specific file attributes) RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7 1 line one 8 2 line two 8 3 line three 10 NOTE: 3 records were read from the infile (system-specific pathname). The minimum record length was 8. The maximum record length was 10. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 1414 1415 data _null_; 1416 infile xx ignoredoseof; 1417 input; 1418 list; 1419 run; NOTE: The infile XX is: (system-specific pathname), (system-specific file attributes) RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7 1 line one 8 2 line two 8 3 CHAR line three. 11 ZONE 66662767661 NUMR C9E5048255A 4 line four 9 NOTE: 4 records were read from the infile (system-specific pathname). The minimum record length was 8. The maximum record length was 11. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Does the imported data look OK (e.g. look at the first ten lines of the file and compare them to the first ten observations of the dataset)?
Does the premature end of the import always happen at the same spot?
If the answer to both questions is "yes", open the file with a suitable text editor (e.g. Notepad++) and scroll down to the trouble spot to get a clue.
The data does look ok and I didn't find anything odd with the troubled spot.
Hi,
Interesting problem. How did you determine the file has 3M records? Is there another method you can use to confirm the number of records in the file? Are you sure you're reading the same file?
How many records are listed in the log as being input if you run the following:
data _null_;
infile "&FILE_DIRECTORY.";
input;
run;
Thanks & kind regards,
Amir.
I'm sure the file has around 3M rows because I can open it on the notepad. Moreover, before I import it, it has 200MB, but after I import it, it only shows 60MB.
After I run
data _null_;
infile "&FILE_DIRECTORY.";
input;
run;
the LOG shows
NOTE: 1276290 records were read from the infile "&FILE_DIRECTORY.".
The minimum record length was 15.
The maximum record length was 74.
So it appears it is not reading everything.
@Babado wrote:
... Moreover, before I import it, it has 200MB
Not true. From your own log, as posted in the initial post:
NOTE: The infile "&FILE_DIRECTORY." is: Filename=YYYY.TXT, Owner Name=sastoken,Group Name=sasdata, Access Permission=-rw-rw-r--, Last Modified=16Aug2023:17:18:13, File Size (bytes)=66746379
So you can see that the file available to SAS is only ~60M in size, which corresponds with the later dataset size.
I guess you had a problem uploading the file to the SAS server.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.