Hello team,
I have a program that reads data from csv file to SAS. I read the log:
log says:
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
and then under it prints one record.
What does this mean?
I have 32,000 records, only a bunch of them are printed to log. I myself assume those several records printed to log are those ones that didn't get read to SAS or SAS splits the data into pages in the memory, the last page is being shown in the log.
Any tips are greatly appreciated.
Thanks
Blue in the sky
Show us the whole log for this DATA step or PROC, never show us partial logs. And copy the log as text and paste it into the window that appears when you click on the </> icon. DO NOT SKIP THIS STEP. This is so important, I am going to say it again. DO NOT SKIP THIS STEP. This is so important, I am going to say it again. DO NOT SKIP THIS STEP.
The line you showed us lets you determine what character is in position 78 (for example) in the input line.
When there is an error when reading one (or more) of the variables on a line the SAS will dump the values of all of the variables (like what you get with a put _all_; statement) and also the offending line from the source file (list what you get from a list; statement).
Example:
filename csv temp;
proc export data=sashelp.class(obs=3) file=csv dbms=csv;
run;
data class ;
infile csv dsd firstobs=2 truncover ;
input name :$8. age sex $ height weight ;
run;
Log
3093 data class ; 3094 infile csv dsd firstobs=2 truncover ; 3095 input name :$8. age sex $ height weight ; 3096 run; NOTE: The infile CSV is: Filename=...\#LN00106, RECFM=V,LRECL=32767,File Size (bytes)=92, Last Modified=31Mar2023:14:32:03, Create Time=31Mar2023:14:32:03 NOTE: Invalid data for age in line 2 8-8. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 2 Alfred,M,14,69,112.5 20 name=Alfred age=. sex=14 height=69 weight=112.5 _ERROR_=1 _N_=1 NOTE: Invalid data for age in line 3 7-7. 3 Alice,F,13,56.5,84 18 name=Alice age=. sex=13 height=56.5 weight=84 _ERROR_=1 _N_=2 NOTE: Invalid data for age in line 4 9-9. 4 Barbara,F,13,65.3,98 20 name=Barbara age=. sex=13 height=65.3 weight=98 _ERROR_=1 _N_=3 NOTE: 3 records were read from the infile CSV. The minimum record length was 18. The maximum record length was 20. NOTE: The data set WORK.CLASS has 3 observations and 5 variables. NOTE: Compressing data set WORK.CLASS increased size by 100.00 percent. Compressed is 2 pages; un-compressed would require 1 pages. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
So you can see that the log is saying the the value for AGE on the second line of the CSV file is not valid. It was trying to use the value in 8th character on the line. Which you can see from the print out is the letter M, which is not a value way to represent a number (unless you included M in the list of single letters to treat as missing values in a missing; statement.)
If you have a lot of errors after awhile the data step will write a note saying you have had too many and stop printing the extra information for every invalid line.
If the lines are long (or any of the characters are non-printing characters) then the LIST output will look more like:
Example:
3111 options generic; 3112 filename text temp; 3113 data _null_; 3114 file text ; 3115 put 'This line has some non standarad characters' 3116 @ 5 '09'x 3117 @ 140 'And it is very long' 3118 ; 3119 run; NOTE: The file TEXT is: (system-specific pathname), (system-specific file attributes) NOTE: 1 record was written to the file (system-specific pathname). The minimum record length was 158. The maximum record length was 158. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 3120 3121 data _null_; 3122 infile text; 3123 input; 3124 list; 3125 run; NOTE: The infile TEXT is: (system-specific pathname), (system-specific file attributes) RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 1 CHAR This.line has some non standarad characters ZONE 5667066662667276662666277666676626667667677222222222222222222222222222222222222222222222222222222222 NUMR 48939C9E5081303FD50EFE0341E4121403812134523000000000000000000000000000000000000000000000000000000000 101 And it is very long 158 NOTE: 1 record was read from the infile (system-specific pathname). The minimum record length was 158. The maximum record length was 158. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
In this example you can see that the fifth character has the hex code of 09 instead of the hex code of 20 that represents the spaces. And that the line is too long to fit it all on one line of the LOG so prints 100 characters first and then starts printing the other 58 characters on a new line. That part of the line does not have any non-standard characters so the hex codes are not displayed under it. Notice also that the line length is listed after the end of each line. So this whole line is 158 characters long.
First of all, you have to fix this. Once you fix that, there may (or may not) be additional errors that need to be fixed.
3093 data class ;
3094 infile csv dsd firstobs=2 truncover ;
3095 input name :$8. age sex $ height weight ;
3096 run;
NOTE: The infile CSV is:
Filename=...\#LN00106,
RECFM=V,LRECL=32767,File Size (bytes)=92,
Last Modified=31Mar2023:14:32:03,
Create Time=31Mar2023:14:32:03
NOTE: Invalid data for age in line 2 8-8.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
2 Alfred,M,14,69,112.5 20
name=Alfred age=. sex=14 height=69 weight=112.5 _ERROR_=1 _N_=1
Your input statement has AGE as the second variable that it is supposed to read, and sex as the third variable that it is supposed to read. But in row 2, the first variable is the name Alfred which is read properly, but then instead of age as the second variable and sex as the third variable, the input line for Alfred has sex as the second variable and age as the third variable.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.