04-03-2014 01:40 PM
I'm using SAS Unix , whenever i try to import a CSV file , using PROC IMPORT, the label of the last column always comes as VAR , even though i use getnames=yes, still the last column does not have label.
04-03-2014 01:59 PM
You likely have a column header that duplicates that of another column or if it is longer than 32 characters then duplicates another long column header for the first 32 characters.
After running proc import look in the log for the generated datastep code and look at the variables. If you have long names you might find some that are truncated. Those are likely candidates.
You can bring the code into the editor by using the F4 key (recall last code) or copy and paste from the log and edit the program to assign a name to the offending variables.
Actually 95% or more of the time I do this with CSV files because at least one variable will come in with either a default data type incorrect, I want a different format than that assigned by Import, or to add meaningful label statements because of similar variable names.
04-03-2014 02:28 PM
This program demonstrates the duplicate name scenario you suggest.
04-08-2014 01:00 AM
I am actually struggling with a similar problem - extra variables. The CSV file is generated by MS Access (I believe). I have checked for duplicate headers and long names. The TERMSTR= option does not help. My suspicion was extra trailing commas, but that did not seem to be the case.
The data seem to go where they should and the extra columns (VAR31 and VAR32 in my case) are missing. Converting to xlsx and then back to CSV (hoping to remove odd characters) does not help. Before I just DROP VAR:; i would like to discover why - probably just to be sure. The system generates quite a few of these csv files not all have this issue.
*** update ***
As I continued to work on this I have found a parsing problem in the IMPORT process that generates the DATA step that actually reads the CSV file. I have not solved this yet, but what I think I know so far is that a long DSD string is incorrectly parsed. the field in question contains a quoted string and some special characters such as: "blah blah ""special blah"" other blah + more blah"
The resulting DATA step has incorrect lengths assigned to the INFORMATs and an extra VARxx. Ultimately the data go into the correct fields, but with the incorrect lengths there is truncation.
I will try to get permission to post the CSV or a portion of it.
Message was edited by: Art Carpenter
04-07-2014 05:40 AM
When i pressed the F4 key, i was able to assign a name, but thsi will becoem tedious if i have more columns. When i was recalling the code, what i saw is teh last column in the csv file is a numeric columnm, but sas is reading it as charater column, but this is happening only with last column always. I tried to check the spaces in the label, nothing was there. The name is just four letters.
04-07-2014 11:46 AM
Even posting just the header row, without data will likely help.
If SAS is assigning character to what you think is numeric, one of the rows of data within the value of GuessingRows has something, possibly even a notprintable character, that isn't numeric.
Another reason why I look at the code generated by proc import.
04-08-2014 03:16 AM
There are posts which are already discussed on the similar issue, I'm using SAS hich is installed on Unix machione and the files are created on Windows desktop and then i ftp to my Unix Server, so far whatever i have gone through it looks there is some carriage return or line feed at the end of the column in my csv file, which is causing the issue.
The solutions that re suggested are extending the DLM to use '0D'x or using the dos2unix commands, TERMSTR. Sorry i cannot post the data file.
04-08-2014 04:53 AM
I have tried lot of option, finalley when i googled it out and searched for various blogs, finally found this.
cat file.txt | tr -d '\r' > file_new.txt
it works , type the above command in Unix, then you import the csv file in sas, everything comes fine.
04-08-2014 04:55 AM
If you are convinced the CR LF (0d 0a) is causing your issue. Than that is normal is the conventions on that part between Windows and Unix are different.
Be happy with SAS you can use the TERMSTR to adjust that. Is a good explantion as 0d LF is typical not numeric but char even when your eyes will not mark those LF bytes. There are more of them high/low value etc. Computers are doing things as being instructed not necessary the same what humans are expecting .
Hope you will not caught by encoding issue-s as of utf-8 / latin1. Windows is commonly using utf-8(multi byte), SAS often limited to latin1 (single byte).