I have a csv file with some non-English characters in it. Using data infile in regular SAS, once the foreign character is encountered, the system stops reading in more data and ignores the remainder of the dataset, resulting in a large chunk of the file not imported. I then tried the same code under SAS 9.4 with unicode support, which seems to read in the correct number of rows. The problem is when I try to save it as a dataset on a pre-defined library:
"ERROR: Some character data was lost during transcoding in the dataset lib1.dat1. Either the data
contains characters that are not representable in the new encoding or truncation occurred during transcoding."
and
"WARNING: The data set lib1.data1 may be incomplete. When this step was stopped there were x observations and y variables." x is way less than the number of records in the dataset in the work library.
Is this because dataset is currently in unicode and needs to be converted to regular coding somehow before it can be saved and used? Thanks.
I had a similar encoding problem recently and solved it this way using the OUTENCODING option:
libname inlib 'MyInputLibrary';
libname outlib 'MyOutputLibrary' outencoding=asciiany;
proc copy noclone in=inlib out=outlib;
select MyDataset;
run;
this is an encoding issue. Encoding setting in sas config might defaulted to latin1. we had problem like this before and we tried to fix it
with changing a session option, it did not worked out. What we were suggested at that point of time to change the encoding setting to UTF8 in sas config file.
do a
proc options option=encoding; run;
I think you will see latin1 as your encoding.
There might be another solution, but I am just sharing my experience.
Thanks. Actually I see " ENCODING=UTF-8" because I'm under SAS with unicode support I suppose. I am guessing that the problem of incampatibility arises when I try to save a dataset that's in unicode into a dataset in another coding, although I don't know why SAS would automatically choose to save it in another coding format without asking, maybe that's the default.
I had a similar encoding problem recently and solved it this way using the OUTENCODING option:
libname inlib 'MyInputLibrary';
libname outlib 'MyOutputLibrary' outencoding=asciiany;
proc copy noclone in=inlib out=outlib;
select MyDataset;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.