BookmarkSubscribeRSS Feed
GN0001
Barite | Level 11

Hello team,

 

I have a program that reads data from csv file to SAS. I read the log:

log says:

----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

and then under it prints one record. 

What does this mean?

 

I have 32,000 records, only a bunch of them are printed to log. I myself assume those several records printed to log are those ones that didn't get read to SAS or SAS splits the data into pages in the memory, the last page is being shown in the log.

 

Any tips are greatly appreciated.

 

Thanks

Blue in the sky

Blue Blue
3 REPLIES 3
PaigeMiller
Diamond | Level 26

Show us the whole log for this DATA step or PROC, never show us partial logs. And copy the log as text and paste it into the window that appears when you click on the </> icon. DO NOT SKIP THIS STEP. This is so important, I am going to say it again. DO NOT SKIP THIS STEP. This is so important, I am going to say it again. DO NOT SKIP THIS STEP. 

 

PaigeMiller_0-1663012019648.png

 

The line you showed us lets you determine what character is in position 78 (for example) in the input line.

--
Paige Miller
Tom
Super User Tom
Super User

When there is an error when reading one (or more) of the variables on a line the SAS will dump the values of all of the variables (like what you get with a put _all_; statement) and also the offending line from the source file (list what you get from a list; statement).

 

Example:

 

filename csv temp;
proc export data=sashelp.class(obs=3) file=csv dbms=csv;
run;

data class ;
  infile csv dsd firstobs=2 truncover ;
  input name :$8. age sex $ height weight ;
run;

Log

3093  data class ;
3094    infile csv dsd firstobs=2 truncover ;
3095    input name :$8. age sex $ height weight ;
3096  run;

NOTE: The infile CSV is:
      Filename=...\#LN00106,
      RECFM=V,LRECL=32767,File Size (bytes)=92,
      Last Modified=31Mar2023:14:32:03,
      Create Time=31Mar2023:14:32:03

NOTE: Invalid data for age in line 2 8-8.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
2         Alfred,M,14,69,112.5 20
name=Alfred age=. sex=14 height=69 weight=112.5 _ERROR_=1 _N_=1
NOTE: Invalid data for age in line 3 7-7.
3         Alice,F,13,56.5,84 18
name=Alice age=. sex=13 height=56.5 weight=84 _ERROR_=1 _N_=2
NOTE: Invalid data for age in line 4 9-9.
4         Barbara,F,13,65.3,98 20
name=Barbara age=. sex=13 height=65.3 weight=98 _ERROR_=1 _N_=3
NOTE: 3 records were read from the infile CSV.
      The minimum record length was 18.
      The maximum record length was 20.
NOTE: The data set WORK.CLASS has 3 observations and 5 variables.
NOTE: Compressing data set WORK.CLASS increased size by 100.00 percent.
      Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

So you can see that the log is saying the the value for AGE on the second line of the CSV file is not valid.  It was trying to use the value in 8th character on the line. Which you can see from the print out is the letter M, which is not a value way to represent a number (unless you included M in the list of single letters to treat as missing values in a missing; statement.)

 

If you have a lot of errors after awhile the data step will write a note saying you have had too many and stop printing the extra information for every invalid line.

 

If the lines are long (or any of the characters are non-printing characters) then the LIST output will look more like:

Example:

3111  options generic;
3112  filename text temp;
3113  data _null_;
3114    file text ;
3115    put 'This line has some non standarad characters'
3116     @ 5 '09'x
3117     @ 140 'And it is very long'
3118    ;
3119  run;

NOTE: The file TEXT is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 1 record was written to the file (system-specific pathname).
      The minimum record length was 158.
      The maximum record length was 158.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


3120
3121  data _null_;
3122    infile text;
3123    input;
3124    list;
3125  run;

NOTE: The infile TEXT is:
      (system-specific pathname),
      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

1   CHAR  This.line has some non standarad characters
    ZONE  5667066662667276662666277666676626667667677222222222222222222222222222222222222222222222222222222222
    NUMR  48939C9E5081303FD50EFE0341E4121403812134523000000000000000000000000000000000000000000000000000000000
     101                                         And it is very long 158
NOTE: 1 record was read from the infile (system-specific pathname).
      The minimum record length was 158.
      The maximum record length was 158.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

In this example you can see that the fifth character has the hex code of 09 instead of the hex code of 20 that represents the spaces.  And that the line is too long to fit it all on one line of the LOG so prints 100 characters first and then starts printing the other 58 characters on a new line.  That part of the line does not have any non-standard characters so the hex codes are not displayed under it.  Notice also that the line length is listed after the end of each line.  So this whole line is 158 characters long.

 

PaigeMiller
Diamond | Level 26

First of all, you have to fix this. Once you fix that, there may (or may not) be additional errors that need to be fixed.

 

3093  data class ;
3094    infile csv dsd firstobs=2 truncover ;
3095    input name :$8. age sex $ height weight ;
3096  run;

NOTE: The infile CSV is:
      Filename=...\#LN00106,
      RECFM=V,LRECL=32767,File Size (bytes)=92,
      Last Modified=31Mar2023:14:32:03,
      Create Time=31Mar2023:14:32:03

NOTE: Invalid data for age in line 2 8-8.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
2         Alfred,M,14,69,112.5 20
name=Alfred age=. sex=14 height=69 weight=112.5 _ERROR_=1 _N_=1

 

 

Your input statement has AGE as the second variable that it is supposed to read, and sex as the third variable that it is supposed to read. But in row 2, the first variable is the name Alfred which is read properly, but then instead of age as the second variable and sex as the third variable, the input line for Alfred has sex as the second variable and age as the third variable.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 390 views
  • 0 likes
  • 3 in conversation