BookmarkSubscribeRSS Feed
Mscarboncopy
Pyrite | Level 9

I have read a previous post about this but couldn't find the solution for me.

The error says "Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding."

 

The data file is originally SPSS, which I then convert to sas to work with it in sas (since I have to do some recoding and sas is my preferred program to use). 

 

I have strings that are long but not 30000, as it says on my SPSS. So I am not sure. 

I used this code

libname xx cvp 'C:mylocation';
proc copy in=xx out=work noclone;
run;

but the error still remained.

System Options for BUFSIZE and REUSE were used at user's request.
NOTE: Libname and/or system options for compress, pointobs, data representation and encoding attributes were used at user's request

Some character data was lost during transcoding in the dataset PREMR.PREMRDATA1092020. Either the data contains characters that are not representable in the new encoding or truncation
occurred during transcoding.
ERROR: File WORK.X has not been saved because copy could not be completed.

 

I have one or two variables for which the strings can be anywhere from 10 words to 300 words (not 30000 though) and I am wondering if the real issues are the symbols we use in these strings. Please see example below. This is the only thing I can think of. If this is the issue how do I handle this?

 

"White Blood Cell; 15.8 k/uL; 4.0-11.0 k/uL; Red Blood Cell; 4.41 m/uL; 3.60-5.20 m/uL; Red Cell Distribution; 14.7%; 11.5-14.0%; Mean Platelet Volume; 8.1 fl; 6.0-10.0 fl; Segmented Neutrophils; 86%; 34-64%; Lymphocyte; 10%; 28-48%; Monocyte; 4%; 1-13%; EOS; 1%; 0-5%"

 

Thank you so much,

 

18 REPLIES 18
yabwon
Onyx | Level 15

What is your SAS session encoding? What:

%put &sysencoding.;

returns?

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Mscarboncopy
Pyrite | Level 9

I will try the suggestions next week, as I am not working this week, but wanted to thank you in advance. I will let you know what I find.

Mscarboncopy
Pyrite | Level 9

Hi. I finally had time to go back to this issue I was having.

I tried your suggestion: 

Data test;
Set pr.mydatafilename;
%put &sysencoding.;
Run;

 

and the error remains:

After expansion the size of at least one character variable exceeds the maximum length of 32767.
The length will be set to the maximum

 

and then:

Some character data was lost during transcoding in the dataset  pr.mydatafilename . Either
the data contains characters that are not representable in the new encoding or truncation
occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4023 observations read from the data set X (my data file name)
WARNING: The data set WORK.TEST may be incomplete. When this step was stopped there were 4023
observations and 3053 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.

 

I am not sure what to do. it is the first time I am trying to work with such a long data file and with such long strings.

 

 

Tom
Super User Tom
Super User

The purpose of checking the value of the macro variable SYSENCODING was so you could know (and show us) what encoding your SAS session was using. Putting the %PUT statement into the middle of a data step makes no sense.  Run the %PUT statement can copy and paste the result into the pop-up window that appears when you click the Insert Code (looks like < / > ) button.  Like this:

198   %put &=sysencoding;
SYSENCODING=wlatin1

The purpose of the ENCODING=ANY dataset option is to prevent SAS from trying to transcode (change characters from one encoding to another) in the process of copying the data.  As such it would make more sense to place it on the INPUT dataset reference instead of the target dataset.

Mscarboncopy
Pyrite | Level 9

My apologies. I am not good with macros and it is something I need to learn. And I am not sure I understand what you told me to do. However,

I ran this right after the libname 

%put &sysencoding.;

Run;

 

Then I called in my file  (using encode any) and interesting enough it is still saying there was some truncation but the data file was generated this time (all obs were there, as opposed to 0 obs from before).

 

However, the error remains and I am sorry I don't see a pop up window when I run the %put. I am using SAS 9.4 

My log looks like this

After I run the %out in my input data

 

libname pr 'XXXXXXXXXXXXXXXXXXXXXXXX';
NOTE: Libref PR was successfully assigned as follows:
Engine: V9
Physical Name: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
43 %put &sysencoding.;
wlatin1
44 Run;

 

And when I call my data (sorry I don't know where else to place encoding =any.

data test3(encoding=any);
46 set pr.X;
NOTE: Data file pr.X is in a format that is native to another host, or the file
encoding does not match the session encoding. Cross Environment Data Access will be used, which
might require additional CPU resources and might reduce performance.
47 Run;

ERROR: Some character data was lost during transcoding in the dataset PREMR.PREMRDATA1092020. Either
the data contains characters that are not representable in the new encoding or truncation
occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4023 observations read from the data set X.
WARNING: The data set WORK.TEST3 may be incomplete. When this step was stopped there were 4023
observations and 3053 variables.
WARNING: Data set WORK.TEST3 was not replaced because this step was stopped.

I don't know if I said this before (I think I did) and if it matters,  but the data file is coming from REDCap as SPSS and I am saving it as a sas  file to work with it.

Thank you so much for your patience. 

 

 

 

 

Tom
Super User Tom
Super User

You are running SAS using a single byte encoding. So there is a maximum of 256 characters that can be represented.  If the data you are reading is using a different encoding  then it is definitely possible you have some characters that cannot be represented in WLATIN1.

 

Steps to take.

Run PROC CONTENTS on the source dataset to see what encoding it is using.

Start SAS using UTF-8 encoding, which is also sometimes called "unicode support" and then try to read the dataset.

 

If you cannot get SAS running using UTF-8 encoding then try telling SAS to ignore the encoding of the source dataset.  

data test3;
 set pr.X(encoding=any);
run;

 

Ksharp
Super User
Try option encoding :


data copy(encoding=any);
set sashelp.class;
run;
Mscarboncopy
Pyrite | Level 9

I will try the suggestions next week, as I am not working this week, but wanted to thank you in advance. I will let you know what I find.

 

Mscarboncopy
Pyrite | Level 9

Hi. I finally had time to go back to this issue I was having.

I tried your suggestion: 

data test(encoding=any);
set pr.nameofmydatafile;
Run;

 

and the error remains:

After expansion the size of at least one character variable exceeds the maximum length of 32767.
The length will be set to the maximum

 

and then:

Some character data was lost during transcoding in the dataset  pr.nameofmydatafile . Either
the data contains characters that are not representable in the new encoding or truncation
occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 4023 observations read from the data set X (my data file name)
WARNING: The data set WORK.TEST may be incomplete. When this step was stopped there were 4023
observations and 3053 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.

 

I am not sure what to do. it is the first time I am trying to work with such a long data file and with such long strings.

 

 

SASKiwi
PROC Star

I suggest you open a Tech Support track. They are in the best position to help you with this problem.

Ksharp
Super User
As Tom said, the problem is due to your sas have different encoding with your sas dataset.

a workaround way maybe
1) change your original dataset into CSV file

filename x 'c:\temp\temp.csv' encoding='utf8' ;
proc export data=have outfile=x ........

2)read it into SAS.

filename x 'c:\temp\temp.csv' encoding='utf8' ;
proc import datafile=x out=want ........
Mscarboncopy
Pyrite | Level 9

Thank you. Could you please walk me through this? I searched to try to figure it out by myself but I can't seem to be able to.

I am trying to convert my SPSS file into CSV. Using what you suggested(test is what my SPSS file is named). When I got he errors below I tried adding the DBMS=csv but the same error shows up. What am I doing wrong? Thank you again.

 

libname pr ' my computer location';


filename edited 'c:\temp\temp.csv' encoding='utf8' ;
proc export data= 'test' outfile= edited 'Editedtest';
Run;

 

filename edited 'c:\temp\temp.csv' encoding='utf8' ;
NOTE: PROCEDURE EXPORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

NOTE: The SAS System stopped processing this step because of errors.
68 proc export data= test outfile= edited Editedtest;
----------
202
ERROR 22-322: Syntax error, expecting one of the following: ;, DATA, DBLABEL, DBMS, DEBUG, FILE, LABEL,
OUTFILE, OUTTABLE, REPLACE, TABLE, _DEBUG_.

ERROR 202-322: The option or parameter is not recognized and will be ignored.

69 DBMS=csv;
70 Run;

 

 

 

Tom
Super User Tom
Super User

What are you actually trying to do here. 

What is the original source file?  Is it a SAS dataset? An SPSS file? If SPSS file what type a SAV file or a portable file.  Is it a CSV file? Something else?

What are you trying to do with the data?  Use it in some analysis in SAS? Transfer it to some other format? Something else?

 

If the data you are trying to read is using UTF-8 characters (for example special symbols) then you need to first figure out how to start SAS using UTF-8 encoding. How are you accessing SAS?  If you are running SAS itself on a WIndow machine then you should have separate command you can use that launches SAS with unicode support.  If you are using SAS Enterprise Guide or SAS/Studio to connect to a SAS application server then connect to a server that is configured with unicode support.

Mscarboncopy
Pyrite | Level 9

I am trying to use a SPSS file (sav), saving it to sas first, to write a long code  to change this file (original SPSS) from a long format to a wide format. The SPSS file has 6.000 + entries and the sas file has only 4.000 entries due to the errors I mentioned in my original question, I am using sas 9.4 (unicode support). So I think it already has the unicode support you are mentioning, since it is in the name? I really  do not know, as I never had this issue before. Thank you.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 6695 views
  • 2 likes
  • 5 in conversation