Good afternoon,
I was trying to merge two datasets into one (one with measurements the other with characteristics for the same observations). However, the datastep was terminated after observation number 27 (there are 13000). The error shown is the following:
ERROR: Some character data was lost during transcoding in the dataset _EXP1_.BASCOCOG. Either the data contains characters that are not representable in the new encoding or truncation occurred during transcoding.
After using:
%let dsn=libref.data;
%let dsid=%sysfunc(open(&dsn,i));
%put &dsn ENCODING is: %sysfunc(attrc(&dsid,encoding));
,I got the following WARNING: Argument 1 to function ATTRC referenced by the %SYSFUNC or %QSYSFUNC macro function is out of range.
Does someone know how to fix this? I would really be appreciated
Thank you in advance, looking forward to hear from you
I have seen this exact issue happening when migrating data from a LATIN platform to UTF-8. In UTF-8 you may need more bytes in UTF-8 for the same data than in LATIN (for example, the e-accent or é) is 1 byte in LATIN but two in UTF-8. Now if the dataset copies the length attributes for character columns from source to target it fails to take this into consideration, resulting in truncation. If you analyze your observation 27 you will likely find the culprit.
SAS has a solution for this. This is the CVP engine. This engine will multiply the length of the target columns by a set amount to avoid the issue. Using this engine would be like:
libname source cvp 'Source-data-library';
libname target 'Target-data-library';
proc copy noclone in=source out=target;
run;
Hope this helps,
- Jan.
After this line
%let dsid=%sysfunc(open(&dsn,i));
insert and run this line
%put &=dsid;
What does it write to the log?
No issues with the code. You may have to replace the 'libref' with the actual library name in your code.
17 %let dsn=sashelp.class; 18 %let dsid=%sysfunc(open(&dsn,i)); 19 %put &dsid.; 4 20 %put &dsn ENCODING is: %sysfunc(attrc(&dsid,encoding)); sashelp.class ENCODING is: us-ascii ASCII (ANSI) 21 %let cl=%sysfunc(close(&dsid));
@GLO1 wrote:
138 %let dsid=%sysfunc(open(&dsn,i));
139 %put &=dsid;
DSID=0
140 %put &dsn ENCODING is: %sysfunc(attrc(&dsid,encoding));
WARNING: Argument 1 to function ATTRC referenced by the %SYSFUNC or %QSYSFUNC macro function
is out of range.
According to the documentation for the OPEN function
OPEN returns 0 if the data set could not be opened.
So that's why the next line fails. LIBREF.DATA cannot be opened. Perhaps because it does not exist.
Paige is right . I got no problem.
69 %let dsn=sashelp.class; 70 %let dsid=%sysfunc(open(&dsn)); 71 %let encoding=%sysfunc(attrc(&dsid,encoding)); 72 %let dsid=%sysfunc(close(&dsid)); 73 74 %put Table &dsn encoding is &encoding ; Table sashelp.class encoding is us-ascii ASCII (ANSI)
Ok, maybe I should start from the beginning. This is the actual problem:
165 data expcog_carolina; /*copy CAROLINA EXP1_coga1d and EXP1_bascocog to workfile*/
166 set _EXP1_.coga1d;
NOTE: Data file _EXP1_.COGA1D.DATA is in a format that is native to another host, or the file
encoding does not match the session encoding. Cross Environment Data Access will be
used, which might require additional CPU resources and might reduce performance.
167 run;
NOTE: There were 13078 observations read from the data set _EXP1_.COGA1D.
NOTE: The data set WORK.EXPCOG_CAROLINA has 13078 observations and 186 variables.
NOTE: DATA statement used (Total process time):
real time 0.22 seconds
cpu time 0.18 seconds
169 data expbascog_carolina;
170 set _EXP1_.bascocog;
NOTE: Data file _EXP1_.BASCOCOG.DATA is in a format that is native to another host, or the
file encoding does not match the session encoding. Cross Environment Data Access will be
used, which might require additional CPU resources and might reduce performance.
171 run;
ERROR: Some character data was lost during transcoding in the dataset _EXP1_.BASCOCOG. Either
the data contains characters that are not representable in the new encoding or
truncation occurred during transcoding.
NOTE: The DATA step has been abnormally terminated.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 27 observations read from the data set _EXP1_.BASCOCOG.
WARNING: The data set WORK.EXPBASCOG_CAROLINA may be incomplete. When this step was stopped
there were 27 observations and 156 variables.
WARNING: Data set WORK.EXPBASCOG_CAROLINA was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
I have seen this exact issue happening when migrating data from a LATIN platform to UTF-8. In UTF-8 you may need more bytes in UTF-8 for the same data than in LATIN (for example, the e-accent or é) is 1 byte in LATIN but two in UTF-8. Now if the dataset copies the length attributes for character columns from source to target it fails to take this into consideration, resulting in truncation. If you analyze your observation 27 you will likely find the culprit.
SAS has a solution for this. This is the CVP engine. This engine will multiply the length of the target columns by a set amount to avoid the issue. Using this engine would be like:
libname source cvp 'Source-data-library';
libname target 'Target-data-library';
proc copy noclone in=source out=target;
run;
Hope this helps,
- Jan.
Your solution works, thank you very much for your help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.