BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Reeza
Super User
Did you notice the errors were occurring in a specific value in your data? Did you compare those rows across R & SAS.
Reeza
Super User
And one other thing...after being officially in an R environment for the last several months, R doesn't have as good error handling and notifications as SAS, especially for imports. It just puts them as characters often and 'solves' that issue which just leads to more cleaning unless you specify types/names. That rules is the same across both applications.
Tom
Super User Tom
Super User

@xinyao2019 wrote:

i contacted the person who manages the data and was told he did not find anything that looks strange in the csv file and it can be read into R without any problem. 

Thanks 

 


Just because R reads it without errors doesn't mean that it is in a valid format for use with SAS. (or R for that matter).

If you know the person that creates the file then have them fix it so that it can be properly parsed by SAS INFILE statement.

Tell they to pick one of two methods.

1) Remove any CR ('0D'x) or LF('0A'x) characters from the values of the variables.

2) Make sure that no variable has CRLF combination ('0D0A'x) and write the lines using CRLF as the end-of-line markers.

 

Which reminds me that another nasty thing that you will see in some delimited files is the use of backslash ( \ ) as an "escape" character.  As if you were trying to type the text in at a Unix console or in C program.  In SAS when the value of field contains the delimiter or a double quote character (") then the whole value is enclosed in quotes. If the value is enclosed in quotes and it actually contains the quote character then that character is doubled up so the reader knows it does NOT mark the end of the quoted value. Backslash is just another character.

Tom
Super User Tom
Super User

I wouldn't be surprised if the cause was text values with embedded line breaks that is causing the columns to get messed up.

Especially since you have min line length of 61 and max line length of over 2000.

 

You might be able to fix it easily by using the TERMSTR=CRLF option.  So if the line breaks inside the vlaues is just either single CR or single LF character and the real end of lines are marked with two character CR+LF combination that will let SAS parse the lines properly. You can try creating a fileref using a FILENAME statement with that option.

filename csv "Y:\Datasets\Restricted\Death\Preliminary\deathfull2019pre.csv"
  termstr=crlf
;

proc import
  out=y.deathfull2019pre06242019csv replace
  datafile= CSV dbms=csv 
;
  guessingrows=MAX; 
  getnames=yes;
run;

Otherwise if you have embedded line breaks and they cannot be told apart from the real line breaks you will need to pre-process the file to fix that.  For just a few of them in a not too big file just open the file in a text editor and fix it. Otherwise there are ways that you can count either the number of fields per line or check for CR or LF inside of quotes to fix the file.

 

Update

Yup.  You data has at least one embedded line break on line 1386 that is causing trouble when reading line 1387.  It does look like the value with the line break does have quotes around it.  See the closing quote in column 11 of line 1387.

NOTE: Invalid data for Age_Type in line 1387 28-47.
NOTE: Invalid data for Date_of_Birth___Month in line 1387 55-55.
RULE:  ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----
1387   DA19-00512",,,1,20190304,1,2019-01-16T02:06:20Z,0,1,N,N,N,N,Y 61
koyelghosh
Lapis Lazuli | Level 10
Great observation. I think you are right. it is expecting a closing double quotes and that is making it fail from 1387 onwards.
Tom
Super User Tom
Super User

@koyelghosh wrote:
Great observation. I think you are right. it is expecting a closing double quotes and that is making it fail from 1387 onwards.

No it is that SAS does not check if end of line characters are in quotes or not. So values with embedded end of line characters are treated as two lines.  You need to either use the single CR or single LF in the values and CRLF to mark the ends of the real lines.  Or you have to fix the file to remove those extra line breaks.

koyelghosh
Lapis Lazuli | Level 10

I think it is a typing mistake. Comma and double quotes share the same key (on a US style keyboard). There is a possibility that somebody may have pressed shift + comma ..making it a double quote, instead of comma... Just a guess! Replace line 1387 double quote with a comma and try again.

Reeza
Super User
Double quotes are standard on CSV's to enclose text fields in case they may have a comma in the field.
xinyao2019
Calcite | Level 5

i got errors like this after i ran :

filename csv "Y:\Datasets\Restricted\Death\Preliminary\deathfull2019pre.csv"
termstr=crlf
;

proc import out=y.deathfull2019pre06242019csv replace
datafile= csv
dbms=csv ;
guessingrows=MAX; *the maximum number of the rows in the file;
getnames=yes;

run;

 

NOTE: Invalid data for Res_Geo_Match_Score in line 3 678-679.
NOTE: Invalid data for Record_Axis_Code_14 in line 3 1152-1154.
NOTE: Invalid data for VAR666 in line 3 2760-2763.
NOTE: Invalid data for VAR709 in line 3 2911-2911.
NOTE: Invalid data for VAR903 in line 3 3668-3669.
NOTE: Invalid data for VAR966 in line 3 4025-4025.
NOTE: Invalid data for VAR1277 in line 3 5656-5657.
NOTE: Invalid data for _2108_W__ENTIAT_AVENUE in line 3 5810-5818.
NOTE: Invalid data for VAR1414 in line 3 6169-6171.
NOTE: Invalid data for VAR1434 in line 3 6209-6211.
NOTE: Invalid data for VAR1651 in line 3 7282-7283.
NOTE: Invalid data for _9245_RAINIER_AVE_S in line 3 7402-7410.
NOTE: Invalid data for VAR1788 in line 3 7744-7747.
NOTE: Invalid data for _2_YEARS in line 3 7782-7785.
NOTE: Invalid data for VAR1831 in line 3 7971-7971.
NOTE: Invalid data for VAR2025 in line 3 8785-8786.
NOTE: Invalid data for _5400_MACARTHUR_BLVD in line 3 8916-8924.
NOTE: Invalid data for VAR2162 in line 3 9316-9318.
NOTE: Invalid data for VAR2182 in line 3 9367-9369.
NOTE: Invalid data for _208_N_EUCLID_AVE in line 3 10597-10605.
NOTE: Invalid data for VAR2579 in line 3 11209-11209.
NOTE: Invalid data for VAR2773 in line 3 11997-11998.
NOTE: Invalid data for VAR2910 in line 3 12490-12493.
NOTE: Invalid data for VAR3147 in line 3 13620-13621.
NOTE: Invalid data for VAR3284 in line 3 14095-14098.
NOTE: Invalid data for VAR3658 in line 3 15514-15517.
NOTE: Invalid data for _2701_NW_VAUGHN_ST in line 3 16596-16604.
NOTE: Invalid data for VAR4032 in line 3 16920-16923.
NOTE: Invalid data for VAR4406 in line 3 18423-18426.
NOTE: Invalid data for VAR4823 in line 3 20307-20307.
NOTE: Invalid data for VAR5154 in line 3 21619-21622.
NOTE: Invalid data for VAR5419 in line 3 22793-22801.
NOTE: Invalid data for VAR5765 in line 3 24329-24330.
NOTE: Invalid data for VAR5902 in line 3 24827-24830.
NOTE: Invalid data for VAR6139 in line 3 25882-25883.
NOTE: Invalid data for VAR6276 in line 3 26398-26400.
NOTE: Invalid data for VAR6319 in line 3 26552-26552.
NOTE: Invalid data for VAR6481 in line 3 27336-27338.
NOTE: Invalid data for VAR6513 in line 3 27468-27469.
NOTE: Invalid data for VAR6541 in line 3 27588-27593.
NOTE: Invalid data for VAR6650 in line 3 27961-27964.
NOTE: Invalid data for VAR6887 in line 3 29114-29115.
NOTE: Invalid data for VAR7252 in line 3 30720-30723.
NOTE: Invalid data for VAR7261 in line 3 30771-30772.
NOTE: Invalid data for VAR7309 in line 3 31087-31092.
NOTE: Invalid data for VAR7398 in line 3 31263-31265.
NOTE: Invalid data for VAR7525 in line 3 31957-31969.
NOTE: Invalid data for VAR7617 in line 3 32297-32297.
NOTE: Invalid data for VAR7625 in line 3 32343-32343.
NOTE: Invalid data for VAR7626 in line 3 32345-32353.
NOTE: Invalid data for VAR7654 in line 3 32435-32441.
NOTE: Invalid data for VAR7658 in line 3 32472-32495.
NOTE: Invalid data for VAR7661 in line 3 32517-32521.

....

NOTE: 62 records were read from the infile CSV.
The minimum record length was 2947.
The maximum record length was 32767.
One or more lines were truncated.
NOTE: The data set Y.DEATHFULL2019PRE06242019CSV has 62 observations and 7867 variables.
NOTE: DATA statement used (Total process time):
real time 52.79 seconds
cpu time 49.37 seconds


Errors detected in submitted DATA step. Examine log.
62 rows created in Y.DEATHFULL2019PRE06242019CSV from CSV.

 

ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 3:02.90
cpu time 2:48.87

 

Tom
Super User Tom
Super User
Too bad. Looks like your lines do not end with CR+LF since it caused the file to have way too few lines.
Also it looks like there are SOME instances where CR+LF does occur since it didn't cause the file to have just one line.
Tom
Super User Tom
Super User

@Tom wrote:
Too bad. Looks like your lines do not end with CR+LF since it caused the file to have way too few lines.
Also it looks like there are SOME instances where CR+LF does occur since it didn't cause the file to have just one line.

Remove the embedded linefeeds in your original file so that it can be read as a CSV file by SAS.

See this discussion and solution. 

https://communities.sas.com/t5/SAS-Data-Management/Special-Carriages/m-p/214770/highlight/true#M5143

 

data _null_;
  infile 'sample_issues.csv' recfm=n ;
  file 'sample_issues_fixed.csv' recfm=n ;
  input x $char1. ;
  quotes+(x='"');
  if mod(quotes,2) and (x='0D'x or x='0A'x) then put '|';
  else put x $char1.;
run;

Now you can try using PROC IMPORT to read the new file.

Or write your own data step to read it.  For you TIME variable you could create your own informat or if you know that just the 99:99 values are the only invalid values you can use the ?? modifier in the INPUT statement to suppress the error messages they will call. SAS will set the values to missing.

xinyao2019
Calcite | Level 5

thank you so much!!!

also, i just read the csv file into sas using SAS enterprise guide. no issues at all, and changed the time format it at the same time. 

 

 

Tom
Super User Tom
Super User

@xinyao2019 wrote:

thank you so much!!!

also, i just read the csv file into sas using SAS enterprise guide. no issues at all, and changed the time format it at the same time. 

 

 


The EG tool for importing will analyze the file itself and then generate a new text file and program to send to SAS to run.  It must do a better job on your file than PROC IMPORT does.

xinyao2019
Calcite | Level 5

When i saved the csv file to xlsx, it can be read to sas.

should i just do it? 

 

 

koyelghosh
Lapis Lazuli | Level 10
Make sure everything is good at 1387 and onwards lines by opening the excel sheet. You can then match the output of SAS w.r.t. excel. You have a lot of columns (375 or so I guess).. It could take some time

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 34 replies
  • 10512 views
  • 8 likes
  • 5 in conversation