11-15-2017 04:32 AM
I'm brand new to SAS and have limited experience, and the problem that I'm having seems way over my head. I'm trying to import data from an excel file into SAS, but during a my data step I get an error about encoding. Here's the script that I'm running:
libname surveyx pcfiles path='C:\Users\au551418\Desktop\Learning SAS\AUWork\Survey\Survey.xlsx';
The surveyx library successfully is assigned each time, but when it comes to the set function, I always get this error:
ERROR: Unable to transcode data to/from UCS-2 encoding
Now, I don't know anything about encoding, and have tried to look up on previous forum posts what in the world to do. I have tried a lot of stuff, but none of it has worked, and many of the posts are not xlsx files (which I have no idea if that matters or not). Can anyone help me in a simplistic way? Thanks!
11-15-2017 05:11 AM
To display characters to the screen a computer uses an encoding table, so for instance most tables contain the western alphabet, lower and upper, and they have their codes associated. If you replace the encoding table, the same codes refer to different symbols. This is how a computer a) stores information - a combination of what encoding table, and the code number, and b) represents it on screen, by drawing the associated character from the table.
What the error is telling you is that there is an encoding table found that it cannot switch to (this is part of National Language Support in SAS, plenty of docs on it). Does the Excel file open fine in Excel? If so then I would suggest saving it to CSV and reading that in, however do be aware that you may be missing some information - you will see odd characters on screen in Excel.
I personally have never heard of a UCS-2 encoding format, nor can I find much information about it from a google search, so I would really go back to the source of that file and find out what is going on there, get them to supply it in a standard encoding like UTF.
11-15-2017 05:20 AM
Yea, got encoding, just unfamiliar with SAS more As far as Excel, it opens easily and without issue. I don't see any unusual characters, but there could be something hidden in there.
I tried saving a copy of the excel file as a csv and then importing with this code:
libname surveyx pcfiles path='C:\Users\au551418\Desktop\Learning SAS\AUWork\Survey\Copy of Survey.csv';
But now I get this Error code:
ERROR: Unable to determine file type from PATH= option.
ERROR: Error in the LIBNAME statement.
Do I need to change some format to import csv's?
11-15-2017 05:32 AM
If you save to CSV - which is a text file with data delimited by commas - you do not read it in by using Excel specific libname statements. You read it in as a text file. To get you started you can get most of the code by doing:
proc import datafile="yourcsv.csv" out=want dbms=csv; run;
Then in the log it will show you a large datastep import. This will consist of a file statement, an input statement, plus others like format. This is how you read in plain text files, and you can take this code and alter it to read in the data anyway you like, say changing lengths, or reading in dates in a specific format. Its the most powerful and controlable way to read data.
This talks about tab delimited, but same thing for CSV:
11-15-2017 05:40 AM
Perfect, thank you! Got this to run and data successfully imported. Don't understand the whole UCS-2 formatting thing, but guess we bypassed that Thanks for your help!
11-15-2017 07:39 AM
Regarding the UCS-2 encoding, this SAS Note explains that you can get that error when running a DBCS version of SAS. If you need support for national characters including multibyte characters as you might find in Japan or Chinese languages, then SAS advises using the UTF-8 mode instead. The SAS Note describes this approach.
Just a bit of history: SAS has supported a DBCS version for many years -- with the "DB" meaning SAS requires 2 bytes per char. However, in the past several years Unicode -- and it's lighter cousin UTF-8 -- has emerged as the preferred standard for encoding such data -- characters occupy 1-4 bytes depending on the char in question. Personally, I use SAS in UTF-8 configuration for almost all of my jobs now.
11-15-2017 07:55 AM
Yea, I saw those links before, but they really didn't make any sense to me. If I wanted to convert to UTF-8, what code do I write to do? Or do I have to change the excel file?
I didn't find anything similar to what was written there:
"C:\Program Files\SAS\SASFoundation\9.2\sas.exe" –CONFIG "C:\Program Files\SAS\SASFoundation\9.2\nls\u8\SASV9.CFG"
Don't know what I'm supposed to do about that??
11-15-2017 08:18 AM
If you are running with SAS on your own PC, the easiest thing to do is to change your SASHOME/SASFoundation/9.4/sasv9.cfg file to reference the /u8/sasv9.cfg file.
If using a SAS session that is maintained by an admin, you'll have to ask your admin for access to a session that's running in UTF-8.