BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Babado
Fluorite | Level 6

I'm using SAS EG and trying to import a pipe separated file using the INFILE statement as follows:

 

INFILE "&FILE_DIRECTORY."
LRECL=32767
ENCODING="WLATIN1"
TERMSTR=LF
DLM='|'
MISSOVER
FIRSTOBS=2
DSD ;

 

However, the original file has 3M rows but only 1M are imported. I have tried adding the  IGNOREDOSEOF option but this isn't recognized by SAS and doesn't solve the problem. It is also worth mentioning that the row where the table "breaks" has "nothing special".

 

Any sugestions?

 

I include the LOG below as suggested by @ballardw

 

 

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Program';
4          %LET _CLIENTPROCESSFLOWNAME='Program';
5          %LET _CLIENTPROJECTPATH='XXXX';
6          %LET _CLIENTPROJECTPATHHOST='XXXXX';
7          %LET _CLIENTPROJECTNAME='SAS.egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=SVG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         %macro HTML5AccessibleGraphSupported;
15             %if %_SAS_VERCOMP_FV(9,4,4, 0,0,0) >= 0 %then ACCESSIBLE_GRAPH;
16         %mend;
17         FILENAME EGHTML TEMP;
18         ODS HTML5(ID=EGHTML) FILE=EGHTML
19             OPTIONS(BITMAP_MODE='INLINE')
20             %HTML5AccessibleGraphSupported
21             ENCODING='utf-8'
22             STYLE=HTMLBlue
23             NOGTITLE
24             NOGFOOTNOTE
25             GPATH=&sasworklocation
26         ;
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27         
28         
29         
30         %macro load_input_files_v2 ();
31         
32         	DATA WORK.table;
33         	    LENGTH
34         	        XXXX           8
35         	        XXXX          $ 3
36         	        XXXX          8;
42         	    FORMAT
43         	        XXXX         YYMMDD10.
44         	        XXXX          $CHAR3.
45         	        XXXX         BEST5.;
51         	    INFORMAT
52         	        XXXX         YYMMDD10.
53         	        XXXX          $CHAR3.
54         	        XXXX         BEST5. ;
60         		INFILE "&FILE_DIRECTORY."
61         	        LRECL=32767
62         	        ENCODING="WLATIN1"
63         	        TERMSTR=LF
64         	        DLM='|'
65         	        MISSOVER
66         		FIRSTOBS=2
67         	        DSD ;
68         	    INPUT
69         	        XXXX         : ?? YYMMDD8.
70         	        XXXX          : $CHAR3.
71         	        XXXX         : ?? BEST5.
72         	        XXXX         : ?? BEST21.;
77         	RUN;
       
83         
84         %mend;
85         
86         %load_input_files_v2();

NOTE: The infile "&FILE_DIRECTORY." is:
      Filename=YYYY.TXT,
      Owner Name=sastoken,Group Name=sasdata,
      Access Permission=-rw-rw-r--,
      Last Modified=16Aug2023:17:18:13,
      File Size (bytes)=66746379

NOTE: 1276289 records were read from the infile "&FILE_DIRECTORY."
      The minimum record length was 15.
      The maximum record length was 60.
NOTE: The data set WORK.TABLE has 1276289 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           1.08 seconds
      cpu time            1.08 seconds
      

87         
88         %LET _CLIENTTASKLABEL=;
89         %LET _CLIENTPROCESSFLOWNAME=;
90         %LET _CLIENTPROJECTPATH=;
91         %LET _CLIENTPROJECTPATHHOST=;
92         %LET _CLIENTPROJECTNAME=;
93         %LET _SASPROGRAMFILE=;
94         %LET _SASPROGRAMFILEHOST=;
95         
96         ;*';*";*/;quit;run;
97         ODS _ALL_ CLOSE;
3                                                          The SAS System                         12:25 Wednesday, September 6, 2023

98         
99         
100        QUIT; RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@Babado wrote:

... Moreover, before I import it, it has 200MB

 


Not true. From your own log, as posted in the initial post:

NOTE: The infile "&FILE_DIRECTORY." is:
      Filename=YYYY.TXT,
      Owner Name=sastoken,Group Name=sasdata,
      Access Permission=-rw-rw-r--,
      Last Modified=16Aug2023:17:18:13,
      File Size (bytes)=66746379

So you can see that the file available to SAS is only ~60M in size, which corresponds with the later dataset size.

I guess you had a problem uploading the file to the SAS server.

View solution in original post

15 REPLIES 15
ballardw
Super User

What does the LOG show? Copy the text from the log with the code submitted and all note, messages or warnings, on the forum open a text box and paste the text.

A data step reading a text file also will include a note about the file name read and some characteristics. Include that as well.

 

A single line from a data step program is insufficient to answer almost any question about why a number of records is/isn't in the data set. You may have IF or OUTPUT statements that would reduce what is written to the output data set. Your INPUT statement might contain syntax that reads multiple lines of the source file into a single observation or one line of file into multiple observations.

And those are very simple things that may have an impact on then number of observations without even looking at the contents of the source file.

Babado
Fluorite | Level 6
Post updated.
ballardw
Super User

How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.

Tom
Super User Tom
Super User

@ballardw wrote:

How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.


That seems backwards.  If the actual lines end with CRLF and you tell SAS to only look for LF then the number of lines could not be less.  It could only be the same or more. It could detect more lines if some of the lines contained LF characters in the value of one of the character fields.

 

The effect of using TERMSTR=LF for file that is using CRLF to mark the end of the lines is that the CR at the end of the line will become the last character on the line. So it might cause trouble for the INPUT statement.

Babado
Fluorite | Level 6

I have tried to use TERMSTR=CRLF and the output is null. Also removing the option TERMSTR=LF outputs the same result (same number of lines).

ballardw
Super User

@Tom wrote:

@ballardw wrote:

How sure are you of the TERMSTR=LF? If the file actually contains CRLF you might get skipped lines when the OS reads the CR. The 1,276,289 in that case could be skipping half the lines for a line count of around 2,550,000. Which might be though of as "3m" lines in a rough round.


That seems backwards.  If the actual lines end with CRLF and you tell SAS to only look for LF then the number of lines could not be less.  It could only be the same or more. It could detect more lines if some of the lines contained LF characters in the value of one of the character fields.

 

The effect of using TERMSTR=LF for file that is using CRLF to mark the end of the lines is that the CR at the end of the line will become the last character on the line. So it might cause trouble for the INPUT statement.


It's been a long time but I had a "file" that was actually a pipe that skipped lines because of an end of line issue and how the system treated it going through the pipe. Long shot but the OP has not actually defined the source well. The macro variable could hold all sorts of interesting stuff...

ChrisHemedinger
Community Manager

Have you tried using the DATA step debugger in EG to see what happens near the record where processing stops? 

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
Tom
Super User Tom
Super User

If you are sure the file should have more than 1,276,289 lines then perhaps it has a CONTROL-Z embedded in it that is causing SAS to stop reading at that point.  Try adding the IGNOREDOSEOF option to the INFILE statement.

 

Example:

1399  filename xx temp;
1400  data _null_;
1401    file xx;
1402    put 'line one'
1403      / 'line two'
1404      / 'line three' '1A'x
1405      / 'line four'
1406    ;
1407  run;

NOTE: The file XX is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 4 records were written to the file (system-specific pathname).
      The minimum record length was 8.
      The maximum record length was 11.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


1408
1409  data _null_;
1410   infile xx;
1411   input;
1412   list;
1413  run;

NOTE: The infile XX is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
1         line one 8
2         line two 8
3         line three 10
NOTE: 3 records were read from the infile (system-specific pathname).
      The minimum record length was 8.
      The maximum record length was 10.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


1414
1415  data _null_;
1416    infile xx ignoredoseof;
1417    input;
1418    list;
1419  run;

NOTE: The infile XX is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7
1         line one 8
2         line two 8

3   CHAR  line three. 11
    ZONE  66662767661
    NUMR  C9E5048255A
4         line four 9
NOTE: 4 records were read from the infile (system-specific pathname).
      The minimum record length was 8.
      The maximum record length was 11.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


Kurt_Bremser
Super User

Does the imported data look OK (e.g. look at the first ten lines of the file and compare them to the first ten observations of the dataset)?

Does the premature end of the import always happen at the same spot?

If the answer to both questions is "yes", open the file with a suitable text editor (e.g. Notepad++) and scroll down to the trouble spot to get a clue.

Babado
Fluorite | Level 6

The data does look ok and I didn't find anything odd with the troubled spot.

Reeza
Super User
What makes you think the original file has 3 million rows? Is that from a DOS/Unix line command or from documentation?
Amir
PROC Star

Hi,

 

Interesting problem. How did you determine the file has 3M records? Is there another method you can use to confirm the number of records in the file? Are you sure you're reading the same file?

 

How many records are listed in the log as being input if you run the following:

 

data _null_;
  infile "&FILE_DIRECTORY.";
  input;
run;

 

 

Thanks & kind regards,

Amir.

Babado
Fluorite | Level 6

I'm sure the file has around 3M rows because I can open it on the notepad. Moreover, before I import it, it has 200MB, but after I import it, it only shows 60MB.

 

After I run

data _null_;
  infile "&FILE_DIRECTORY.";
  input;
run;

the LOG shows

NOTE: 1276290 records were read from the infile "&FILE_DIRECTORY.".
The minimum record length was 15.
The maximum record length was 74.

So it appears it is not reading everything.

Kurt_Bremser
Super User

@Babado wrote:

... Moreover, before I import it, it has 200MB

 


Not true. From your own log, as posted in the initial post:

NOTE: The infile "&FILE_DIRECTORY." is:
      Filename=YYYY.TXT,
      Owner Name=sastoken,Group Name=sasdata,
      Access Permission=-rw-rw-r--,
      Last Modified=16Aug2023:17:18:13,
      File Size (bytes)=66746379

So you can see that the file available to SAS is only ~60M in size, which corresponds with the later dataset size.

I guess you had a problem uploading the file to the SAS server.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 2867 views
  • 9 likes
  • 7 in conversation