Solved: Re: Proc Import Generates "Error" from CSV Import

A_SAS_Man · Posted 07-12-2022 11:56 AM

I am importing a csv file into SAS as a part of a larger process. I am using the code below.

proc import datafile="/data/data.csv" 
out=test
dbms=csv replace;
GUESSINGROWS=MAX;
 run;

When I run this, it generates the following Note, but it seems to act as an error would because it makes my process stop and it is underlined in red. But again, I have no "Errors" when I look at my log summary just a few warnings and notes.

In addition, even though it stops my process it actually successfully generates the output data set. Can anyone help me understand what is going on with this and how I could potentially fix it?

Edit: The entire log is very long but I believe based on @ballardw's comment I have found the more relevant part. It seems as though multiple column names are being truncated and coming in with the same name, so after the first instance of that column it generates an error. See below extended log information.

172               informat PresenceOfDiabetesType2InsulinDe best32. ;
173               informat "CurrentlyBeingTreated-DiabetesTy"N $1. ;
174               informat PresenceOfDiabetesType2NonInsuli best32. ;
NOTE: SCL source line.
175               informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
                                                               _______
                                                               485
NOTE 485-185: Informat $BEST was not found or could not be loaded.

176               informat PresenceOfDiabetesType1 best32. ;
NOTE: SCL source line.
177               informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
                                                               _______
                                                               485
NOTE 485-185: Informat $BEST was not found or could not be loaded.

Farther up in the log you can see this as well:

Name CurrentlyBeingTreated-DiabetesType2Insulin truncated to CurrentlyBeingTreated-DiabetesTy.
Name PresenceOfDiabetesType2NonInsulinDependent truncated to PresenceOfDiabetesType2NonInsuli.
Name CurrentlyBeingTreated-DiabetesType2NonInsulin truncated to CurrentlyBeingTreated-DiabetesTy.
Name CurrentlyBeingTreated-DiabetesType1 truncated to CurrentlyBeingTreated-DiabetesTy.

So thank you for potentially finding that issue, is there a way to modify proc import to allow longer variable names or something along those lines? These files are provided from a vendor so I have no control over the naming conventions and am trying to keep this as automated as possible.

ballardw · Posted 07-12-2022 01:27 PM

The problem is stemming from extremely long column headings. Note that line 173 and 175 are attempting to assign different informats and formats to multiple columns with the same name after truncation to the allowed 32 characters. I have had similar data with long column headings but never used Proc Import with Validvarname=any, which allows use of the name literals. The Import generator often fixes some of this with similar names suffixed with 1,2,3 but very long names seem to be the exceptions You have the data data step; copy to the editor and change the names of one of the variables (in each pair, I think you have a fair number). You can drop most of the FORMAT statements and likely the Informats using BEST32. Just make sure that the Input statement has different variable names. If you remove informat statements the order of columns in the data set will change. If you are extremely careful with editor search and replace you can find the second of each of the Informat and Input occurrences of the name and change just that one. Save the code and rerun.

173               informat "CurrentlyBeingTreated-DiabetesTy"N $1. ;
174               informat PresenceOfDiabetesType2NonInsuli best32. ;
NOTE: SCL source line.
175               informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
                                                               _______
                                                               485
NOTE 485-185: Informat $BEST was not found or could not be loaded.

View solution in original post

ballardw · Posted 07-12-2022 12:21 PM

How about showing the entire log of the data step generated and executed by Proc Import.

That specific message means that the variable has already been declared as character and as such a numeric informat, or format, cannot be used for that variable. That is what the $best indicates, that a character variable is used.

Reeza · Posted 07-12-2022 12:25 PM

That would be a really weird error for a standard proc import code to generate. Did you modify the code generated and then get those errors?

A_SAS_Man · Posted 07-12-2022 12:48 PM

I'm not really sure what you mean, I ran the code above in my original post and it generated that output.

Reeza · Posted 07-12-2022 12:49 PM

Definitely need to show the full log as @ballardw requested.

A_SAS_Man · Posted 07-12-2022 01:01 PM

Thank you, I posted more information from the log which I think gets at what you are saying.

ballardw · Posted 07-12-2022 01:08 PM

Start in the log at the Proc Import line. Copy everything through the the notes at the end of the run about observations read.

Paste all of that into a text box.

The note about SCL code often means that something with a dot is used incorrectly but Proc Import generally doesn't do that.

The addition of truncated variable names is not related.

I would also suggest setting: Options validvarname=v7; to stop creating those name literal variables with hyphens, spaces and other non-standard characters in the variable names. It is going to be bad enough dealing with those extremely long variable names without adding the requirements to repeatedly address the "long var name"N literal on top.

A_SAS_Man · Posted 07-12-2022 01:11 PM

It should now be attached to my original post.

A_SAS_Man · Posted 07-12-2022 01:23 PM

The addition of that option actually solves the issue, repeat variable names are just imported formatted as "VAR131" or similar.

Reeza · Posted 07-12-2022 01:05 PM

I think for a file with these type of issues I'd copy the code from the log, manually modify it to be correct including renaming variables as needed and use that to import your data instead of PROC IMPORT. This is a better approach for an automated process anyways as it will also ensure that your variables are always read correctly and the same type/format which is not guaranteed with PROC IMPORT.

A_SAS_Man · Posted 07-12-2022 01:12 PM

I didn't realize proc import was just using a data step. That is really helpful, going to do some testing on that now.

ballardw · Posted 07-12-2022 01:27 PM

The problem is stemming from extremely long column headings. Note that line 173 and 175 are attempting to assign different informats and formats to multiple columns with the same name after truncation to the allowed 32 characters. I have had similar data with long column headings but never used Proc Import with Validvarname=any, which allows use of the name literals. The Import generator often fixes some of this with similar names suffixed with 1,2,3 but very long names seem to be the exceptions You have the data data step; copy to the editor and change the names of one of the variables (in each pair, I think you have a fair number). You can drop most of the FORMAT statements and likely the Informats using BEST32. Just make sure that the Input statement has different variable names. If you remove informat statements the order of columns in the data set will change. If you are extremely careful with editor search and replace you can find the second of each of the Informat and Input occurrences of the name and change just that one. Save the code and rerun.

173               informat "CurrentlyBeingTreated-DiabetesTy"N $1. ;
174               informat PresenceOfDiabetesType2NonInsuli best32. ;
NOTE: SCL source line.
175               informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
                                                               _______
                                                               485
NOTE 485-185: Informat $BEST was not found or could not be loaded.

Reeza · Posted 07-12-2022 01:28 PM

Yup! It just generates the data step code. I often use it as my starting point to write code for large files (lazy/efficient) method IMO.

Then modify as necessary 🙂

I would use @ballardw suggestion to re-run with option validvarname=v7 to clean up some of those names as well.

FYI - if you're using SAS Foundation (not Studio) you can copy the code without the line numbers by holding down ALT + CNTRL while selecting the code. I use Studio so use NotePad++ to remove the line numbers these days.

A_SAS_Man · Posted 07-12-2022 01:31 PM

I am also using studio, can you expand on how you use notepadd++ to remove line numbers?

Reeza · Posted 07-12-2022 01:35 PM

https://community.notepad-plus-plus.org/topic/15203/slick-way-to-remove-line-numbers-from-text

Register Today!

SAS Training: Just a Click Away