I'm using SAS EG and trying to import a pipe separated file using the INFILE statement as follows:
INFILE "&FILE_DIRECTORY."
LRECL=32767
ENCODING="WLATIN1"
TERMSTR=LF
DLM='|'
MISSOVER
FIRSTOBS=2
DSD ;
However, the original file has 3M rows but only 1M are imported. I have tried adding the IGNOREDOSEOF option but this isn't recognized by SAS and doesn't solve the problem. It is also worth mentioning that the row where the table "breaks" has "nothing special".
Any sugestions?
I include the LOG below as suggested by @ballardw
1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Program';
4          %LET _CLIENTPROCESSFLOWNAME='Program';
5          %LET _CLIENTPROJECTPATH='XXXX';
6          %LET _CLIENTPROJECTPATHHOST='XXXXX';
7          %LET _CLIENTPROJECTNAME='SAS.egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=SVG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         %macro HTML5AccessibleGraphSupported;
15             %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16         %mend;
17         FILENAME EGHTML TEMP;
18         ODS HTML5(ID=EGHTML) FILE=EGHTML
19             OPTIONS(BITMAP_MODE='INLINE')
20             %HTML5AccessibleGraphSupported
21             ENCODING='utf-8'
22             STYLE=HTMLBlue
23             NOGTITLE
24             NOGFOOTNOTE
25             GPATH=&sasworklocation
26         ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27         
28         
29         
30         %macro load_input_files_v2 ();
31         
32         	DATA WORK.table;
33         	    LENGTH
34         	        XXXX           8
35         	        XXXX          $ 3
36         	        XXXX          8;
42         	    FORMAT
43         	        XXXX         YYMMDD10.
44         	        XXXX          $CHAR3.
45         	        XXXX         BEST5.;
51         	    INFORMAT
52         	        XXXX         YYMMDD10.
53         	        XXXX          $CHAR3.
54         	        XXXX         BEST5. ;
60         		INFILE "&FILE_DIRECTORY."
61         	        LRECL=32767
62         	        ENCODING="WLATIN1"
63         	        TERMSTR=LF
64         	        DLM='|'
65         	        MISSOVER
66         		FIRSTOBS=2
67         	        DSD ;
68         	    INPUT
69         	        XXXX         : ?? YYMMDD8.
70         	        XXXX          : $CHAR3.
71         	        XXXX         : ?? BEST5.
72         	        XXXX         : ?? BEST21.;
77         	RUN;
       
83         
84         %mend;
85         
86         %load_input_files_v2();
NOTE: The infile "&FILE_DIRECTORY." is:
      Filename=YYYY.TXT,
      Owner Name=sastoken,Group Name=sasdata,
      Access Permission=-rw-rw-r--,
      Last Modified=16Aug2023:17:18:13,
      File Size (bytes)=66746379
NOTE: 1276289 records were read from the infile "&FILE_DIRECTORY."
      The minimum record length was 15.
      The maximum record length was 60.
NOTE: The data set WORK.TABLE has 1276289 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           1.08 seconds
      cpu time            1.08 seconds
      
87         
88         %LET _CLIENTTASKLABEL=;
89         %LET _CLIENTPROCESSFLOWNAME=;
90         %LET _CLIENTPROJECTPATH=;
91         %LET _CLIENTPROJECTPATHHOST=;
92         %LET _CLIENTPROJECTNAME=;
93         %LET _SASPROGRAMFILE=;
94         %LET _SASPROGRAMFILEHOST=;
95         
96         ;*';*";*/;quit;run;
97         ODS _ALL_ CLOSE;
3                                                          The SAS System                         12:25 Wednesday, September 6, 2023
98         
99         
100        QUIT; RUN;
@Babado wrote:
... Moreover, before I import it, it has 200MB
Not true. From your own log, as posted in the initial post:
NOTE: The infile "&FILE_DIRECTORY." is:
      Filename=YYYY.TXT,
      Owner Name=sastoken,Group Name=sasdata,
      Access Permission=-rw-rw-r--,
      Last Modified=16Aug2023:17:18:13,
      File Size (bytes)=66746379
So you can see that the file available to SAS is only ~60M in size, which corresponds with the later dataset size.
I guess you had a problem uploading the file to the SAS server.
What does the LOG show? Copy the text from the log with the code submitted and all note, messages or warnings, on the forum open a text box and paste the text.
A data step reading a text file also will include a note about the file name read and some characteristics. Include that as well.
A single line from a data step program is insufficient to answer almost any question about why a number of records is/isn't in the data set. You may have IF or OUTPUT statements that would reduce what is written to the output data set. Your INPUT statement might contain syntax that reads multiple lines of the source file into a single observation or one line of file into multiple observations.
And those are very simple things that may have an impact on then number of observations without even looking at the contents of the source file.
How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.
@ballardw wrote:
How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.
That seems backwards. If the actual lines end with CRLF and you tell SAS to only look for LF then the number of lines could not be less. It could only be the same or more. It could detect more lines if some of the lines contained LF characters in the value of one of the character fields.
The effect of using TERMSTR=LF for file that is using CRLF to mark the end of the lines is that the CR at the end of the line will become the last character on the line. So it might cause trouble for the INPUT statement.
I have tried to use TERMSTR=CRLF and the output is null. Also removing the option TERMSTR=LF outputs the same result (same number of lines).
@Tom wrote:
@ballardw wrote:
How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.
That seems backwards. If the actual lines end with CRLF and you tell SAS to only look for LF then the number of lines could not be less. It could only be the same or more. It could detect more lines if some of the lines contained LF characters in the value of one of the character fields.
The effect of using TERMSTR=LF for file that is using CRLF to mark the end of the lines is that the CR at the end of the line will become the last character on the line. So it might cause trouble for the INPUT statement.
It's been a long time but I had a "file" that was actually a pipe that skipped lines because of an end of line issue and how the system treated it going through the pipe. Long shot but the OP has not actually defined the source well. The macro variable could hold all sorts of interesting stuff...
Have you tried using the DATA step debugger in EG to see what happens near the record where processing stops?
If you are sure the file should have more than 1,276,289 lines then perhaps it has a CONTROL-Z embedded in it that is causing SAS to stop reading at that point. Try adding the IGNOREDOSEOF option to the INFILE statement.
Example:
1399  filename xx temp;
1400  data _null_;
1401    file xx;
1402    put 'line one'
1403      / 'line two'
1404      / 'line three' '1A'x
1405      / 'line four'
1406    ;
1407  run;
NOTE: The file XX is:
      (system-specific pathname),
      (system-specific file attributes)
NOTE: 4 records were written to the file (system-specific pathname).
      The minimum record length was 8.
      The maximum record length was 11.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
1408
1409  data _null_;
1410   infile xx;
1411   input;
1412   list;
1413  run;
NOTE: The infile XX is:
      (system-specific pathname),
      (system-specific file attributes)
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
1         line one 8
2         line two 8
3         line three 10
NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 8.
      The maximum record length was 10.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
1414
1415  data _null_;
1416    infile xx ignoredoseof;
1417    input;
1418    list;
1419  run;
NOTE: The infile XX is:
      (system-specific pathname),
      (system-specific file attributes)
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
1         line one 8
2         line two 8
3   CHAR  line three. 11
    ZONE  66662767661
    NUMR  C9E5048255A
4         line four 9
NOTE: 4 records were read from the infile (system-specific pathname).
      The minimum record length was 8.
      The maximum record length was 11.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
					
				
			
			
				
			
			
			
			
			
			
			
		Does the imported data look OK (e.g. look at the first ten lines of the file and compare them to the first ten observations of the dataset)?
Does the premature end of the import always happen at the same spot?
If the answer to both questions is "yes", open the file with a suitable text editor (e.g. Notepad++) and scroll down to the trouble spot to get a clue.
The data does look ok and I didn't find anything odd with the troubled spot.
Hi,
Interesting problem. How did you determine the file has 3M records? Is there another method you can use to confirm the number of records in the file? Are you sure you're reading the same file?
How many records are listed in the log as being input if you run the following:
data _null_;
  infile "&FILE_DIRECTORY.";
  input;
run;
Thanks & kind regards,
Amir.
I'm sure the file has around 3M rows because I can open it on the notepad. Moreover, before I import it, it has 200MB, but after I import it, it only shows 60MB.
After I run
data _null_;
  infile "&FILE_DIRECTORY.";
  input;
run;the LOG shows
NOTE: 1276290 records were read from the infile "&FILE_DIRECTORY.".
The minimum record length was 15.
The maximum record length was 74.So it appears it is not reading everything.
@Babado wrote:
... Moreover, before I import it, it has 200MB
Not true. From your own log, as posted in the initial post:
NOTE: The infile "&FILE_DIRECTORY." is:
      Filename=YYYY.TXT,
      Owner Name=sastoken,Group Name=sasdata,
      Access Permission=-rw-rw-r--,
      Last Modified=16Aug2023:17:18:13,
      File Size (bytes)=66746379
So you can see that the file available to SAS is only ~60M in size, which corresponds with the later dataset size.
I guess you had a problem uploading the file to the SAS server.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
