@Negarev wrote:
Hi All,
OK quick update: my trick has worked. It seems that SAS has attributed some custom format to my character variables during the import. This custom format is then detected as incompatible when PLM RESTORE tries to score the data.
Thanks everyone for your help!
@ballardw : Could you please give me a detailed example on how to import a CSV onto SAS via a DATA step?
One way is to use proc import on the csv with the GUESSIONGROWS=MAX; option set. The Log will contain data step code to read the data. You can copy that code from the log and paste into the editor. Clean up by removing line numbers if present in the log. Look at the INFORMAT statements at the top of the code. Those are setting the length and type of the variables. If the informat does not make sense: reading a column that should be numeric as character or vice versa set the type by changing the informat. For character values set the informat long enough for the longest expected value. If a value should be date, time or datetime value make sure the informat matches your expected layout. Remove most of the formats except for date, time or datetime variables.
Change variable names to a "nicer" or standard version.
If you have data that is mostly numeric but has stuff like "NA" "NULL" or some other text that basically means missing you can create a custom informats to read that data assigning a missing value for those text strings and read the rest as 12. or similar informat.
If you have values that should be dates but are missing delimiters to tell SAS that 01212021 should be 21JAN2021 look up the various date, time or datetime INFORMATS. There will usually be one that will work. Caution 6 digit strings with 2-digit years need to be watched closely as 010203 could represent as many as 6 different dates depending on which century you expect the 2-digits to represent.
Add labels and maybe if you have common data issues you can include preliminary data checking.
Other ideas: you can write custom informats for variables that should have a limited range of values that write error messages to the log for you.
proc format ;
invalue codevalue
'1','2','3','4'=_same_
'NA' = .
other=_error_
;
run;
data example;
input x :codevalue.;
datalines;
1
NA
2
3
4
9
88
;
I incorporated an example of how to read text NA as missing without being an error.
The _same_ instruction says to just treat the values in the list as the "same" value. Since this is a numeric informat that means the values are the numbers 1 to 4. Anything other than NA or those digits is invalid data and appears as such in the log. So you could go to the source to find out why these are appearing or take other action.
I use a lot of these formats as I have some data sources that are manually entered and are constantly getting the wrong code values (which can be LOTS longer than shown) in fields.
The default INFORMAT statements will be in column order as will the Input statement. The Input statement must match the column order of the data. The Informat or other statments like ATTRIB that could be used to set the property do not.
Once you have the data step working for one file then you only need to change the input file name and output data set(if needed) and rerun it. There are multiple posts on this forum on ways to read multiple identical structure text files if that is useful.
If you have a source that randomly changes column order but not variables you can adjust the INPUT statement to match the changed order.
... View more