I am importing a csv file into SAS as a part of a larger process. I am using the code below.
proc import datafile="/data/data.csv"
out=test
dbms=csv replace;
GUESSINGROWS=MAX;
run;
When I run this, it generates the following Note, but it seems to act as an error would because it makes my process stop and it is underlined in red. But again, I have no "Errors" when I look at my log summary just a few warnings and notes.
In addition, even though it stops my process it actually successfully generates the output data set. Can anyone help me understand what is going on with this and how I could potentially fix it?
Edit: The entire log is very long but I believe based on @ballardw's comment I have found the more relevant part. It seems as though multiple column names are being truncated and coming in with the same name, so after the first instance of that column it generates an error. See below extended log information.
172 informat PresenceOfDiabetesType2InsulinDe best32. ;
173 informat "CurrentlyBeingTreated-DiabetesTy"N $1. ;
174 informat PresenceOfDiabetesType2NonInsuli best32. ;
NOTE: SCL source line.
175 informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
_______
485
NOTE 485-185: Informat $BEST was not found or could not be loaded.
176 informat PresenceOfDiabetesType1 best32. ;
NOTE: SCL source line.
177 informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
_______
485
NOTE 485-185: Informat $BEST was not found or could not be loaded.
Farther up in the log you can see this as well:
Name CurrentlyBeingTreated-DiabetesType2Insulin truncated to CurrentlyBeingTreated-DiabetesTy.
Name PresenceOfDiabetesType2NonInsulinDependent truncated to PresenceOfDiabetesType2NonInsuli.
Name CurrentlyBeingTreated-DiabetesType2NonInsulin truncated to CurrentlyBeingTreated-DiabetesTy.
Name CurrentlyBeingTreated-DiabetesType1 truncated to CurrentlyBeingTreated-DiabetesTy.
So thank you for potentially finding that issue, is there a way to modify proc import to allow longer variable names or something along those lines? These files are provided from a vendor so I have no control over the naming conventions and am trying to keep this as automated as possible.
The problem is stemming from extremely long column headings. Note that line 173 and 175 are attempting to assign different informats and formats to multiple columns with the same name after truncation to the allowed 32 characters. I have had similar data with long column headings but never used Proc Import with Validvarname=any, which allows use of the name literals. The Import generator often fixes some of this with similar names suffixed with 1,2,3 but very long names seem to be the exceptions You have the data data step; copy to the editor and change the names of one of the variables (in each pair, I think you have a fair number). You can drop most of the FORMAT statements and likely the Informats using BEST32. Just make sure that the Input statement has different variable names. If you remove informat statements the order of columns in the data set will change. If you are extremely careful with editor search and replace you can find the second of each of the Informat and Input occurrences of the name and change just that one. Save the code and rerun.
173 informat "CurrentlyBeingTreated-DiabetesTy"N $1. ;
174 informat PresenceOfDiabetesType2NonInsuli best32. ;
NOTE: SCL source line.
175 informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
_______
485
NOTE 485-185: Informat $BEST was not found or could not be loaded.
How about showing the entire log of the data step generated and executed by Proc Import.
That specific message means that the variable has already been declared as character and as such a numeric informat, or format, cannot be used for that variable. That is what the $best indicates, that a character variable is used.
Definitely need to show the full log as @ballardw requested.
Start in the log at the Proc Import line. Copy everything through the the notes at the end of the run about observations read.
Paste all of that into a text box.
The note about SCL code often means that something with a dot is used incorrectly but Proc Import generally doesn't do that.
The addition of truncated variable names is not related.
I would also suggest setting: Options validvarname=v7; to stop creating those name literal variables with hyphens, spaces and other non-standard characters in the variable names. It is going to be bad enough dealing with those extremely long variable names without adding the requirements to repeatedly address the "long var name"N literal on top.
I think for a file with these type of issues I'd copy the code from the log, manually modify it to be correct including renaming variables as needed and use that to import your data instead of PROC IMPORT. This is a better approach for an automated process anyways as it will also ensure that your variables are always read correctly and the same type/format which is not guaranteed with PROC IMPORT.
The problem is stemming from extremely long column headings. Note that line 173 and 175 are attempting to assign different informats and formats to multiple columns with the same name after truncation to the allowed 32 characters. I have had similar data with long column headings but never used Proc Import with Validvarname=any, which allows use of the name literals. The Import generator often fixes some of this with similar names suffixed with 1,2,3 but very long names seem to be the exceptions You have the data data step; copy to the editor and change the names of one of the variables (in each pair, I think you have a fair number). You can drop most of the FORMAT statements and likely the Informats using BEST32. Just make sure that the Input statement has different variable names. If you remove informat statements the order of columns in the data set will change. If you are extremely careful with editor search and replace you can find the second of each of the Informat and Input occurrences of the name and change just that one. Save the code and rerun.
173 informat "CurrentlyBeingTreated-DiabetesTy"N $1. ;
174 informat PresenceOfDiabetesType2NonInsuli best32. ;
NOTE: SCL source line.
175 informat "CurrentlyBeingTreated-DiabetesTy"N best32. ;
_______
485
NOTE 485-185: Informat $BEST was not found or could not be loaded.
Yup! It just generates the data step code. I often use it as my starting point to write code for large files (lazy/efficient) method IMO.
Then modify as necessary 🙂
I would use @ballardw suggestion to re-run with option validvarname=v7 to clean up some of those names as well.
FYI - if you're using SAS Foundation (not Studio) you can copy the code without the line numbers by holding down ALT + CNTRL while selecting the code. I use Studio so use NotePad++ to remove the line numbers these days.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.