Hi all,
Here is this weird behaviour which I can't explain :
When running the second SQL statement, the CLEAN_DATA variable's value (not length) is truncated to $200. I don't see why this should take place, and how to circumvent this trimming of the value.
Any explanation/idea?
Thank you.
data TEST;
length RAW_DATA CLEAN_DATA $255;
RAW_DATA = collate(33,255) ;
CLEAN_DATA= prxchange('s/[[:^graph:]]/ /o',-1, RAW_DATA ) ;
put RAW_DATA= / CLEAN_DATA=;
run;
proc sql;
create table CLEAN_GOOD as
select RAW_DATA
,prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) as CLEAN_DATA length=255
from TEST;
quit;
proc sql;
create table CLEAN_BAD as
select RAW_DATA
,notprint(RAW_DATA) as NP
,case when notprint(RAW_DATA)
then prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA)
else RAW_DATA
end as CLEAN_DATA length=255
from TEST;
quit;
data _null_;
set TEST;
A=substr(RAW_DATA ,160); putlog 'Raw ' A;
A=substr(CLEAN_DATA,160); putlog 'Test ' A;
set CLEAN_GOOD;
A=substr(CLEAN_DATA,160); putlog 'Good ' A;
set CLEAN_BAD;
A=substr(CLEAN_DATA,160); putlog 'Bad ' A;
run;
Raw ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Test ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Good ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Bad ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçè
[Edit: fixed a typo]
A similar behaviour occurs when using other functions, such as catt(), but prxparse() gives no warning and doesn't choke, so I am wondering if anybody else looked at this.
Even with catt(), why would a case statement not take the length= option into account? Is this expected behaviour?
data TEST;
length RAW_DATA CLEAN_DATA $255;
RAW_DATA = collate(33,255) ;
CLEAN_DATA= catt(RAW_DATA, 'd' ) ;
put RAW_DATA= / CLEAN_DATA=;
run;
proc sql;
create table CLEAN_GOOD as
select RAW_DATA
,catt(RAW_DATA, 'd' ) as CLEAN_DATA length=255
from TEST;
quit;
proc sql;
create table CLEAN_BAD as
select RAW_DATA
,notprint(RAW_DATA) as NP
,case when notprint(RAW_DATA)
then catt(RAW_DATA, 'd' )
else RAW_DATA
end as CLEAN_DATA length=255
from TEST;
quit;
data _null_;
set TEST;
A=substr(RAW_DATA ,160); ;putlog 'Raw ' A;
A=substr(CLEAN_DATA,160); ;putlog 'Test ' A;
set CLEAN_GOOD;
A=substr(CLEAN_DATA,160); ;putlog 'Good ' A;
set CLEAN_BAD;
A=substr(CLEAN_DATA,160); ;putlog 'Bad ' A;
run;
This warning is written only when creatimg table CLEAN_BAD:
WARNING: In a call to the CATT function, the buffer allocated for the result was not long enough to contain the concatenation of all the arguments. The correct result would contain 224 characters, but the actual result might either be truncated to 200
character(s) or be completely blank, depending on the calling environment. The following note indicates the left-most argument that caused truncation.
Result:
Raw ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Test ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿd
Good ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿd
Bad
Maybe it due to PROC SQL.
Better use the most updated SAS version 9.4m4 .
In old sas, the default length of characher is 200.
A think that most string functions will limit their results to $200 when called within PROC SQL.
One exception is the SUBSTR() function.
Try this little test .
data x ;
length x $400 ;
x=repeat('0123456789',39);
run;
proc sql ;
select length(x)
, length(repeat('0123456789',39))
, length(substr(x,1,400))
from x
;
quit;
Try using some other function besides REPEAT() and see if you can get result that is longer than 200 characters.
Thanks for taking interest.
As you can see from the first SQL clause, the function itself does not necessarily trim to $200.
It returns the full result when used with proc SQL when a length is assigned.
What does the trimming is calling the function within a case statement.
The length defined after the case test is not passed on to the function. This is most annoying.
Either it should be or a syntax such as
proc sql;
create table CLEAN_BAD as
select RAW_DATA
,case when notprint(RAW_DATA)
then prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) length=255
else RAW_DATA
end as CLEAN_DATA length=255
from TEST;
quit;
should be supported.
I reckon the length defined after case should known by the inner case results. Otherwise what's the point?
What do you think?
Well that shows a way you can then work around the issue. Make a new column and use the CALCULATED keyword to reference it. You could either use a subquery or use a dataset option to remove the extra column.
proc sql;
create table CLEAN_GOOD2 (drop=regex) as
select RAW_DATA
,prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) as regex length=255
,case when notprint(RAW_DATA)
then calculated regex
else RAW_DATA
end as CLEAN_DATA length=255
from TEST
;
quit;
The point of the CASE test is to avoid parsing a RegEx (a very costly operation) when not needed.
I lose this feature when using a calculated column, and might as well just use my first SQL example.
It really would make sense that the WHEN entries use the length defined for the CASE statement.
I feel like making a ballot suggestion but I doubt it would make ant difference at all.
More generally, we need a way to use functions in SQL and force them to extend beyond 200.
PS Note that most other functions like upcase and coaslesce (but not coalescec) don't trim at $200.
Recent functions cat and quote misbehave badly.
data T;
length X $300;
X=repeat('X',300);
proc sql;
select lengthn(coalesce(' ', X)) as COALESCE %* 300;
, lengthn(coalescec(' ', X)) as COALESCEC %* 200;
, lengthn(upcase(X)) as UPCASE %* 300;
, lengthn(substr(X,10,250)) as SUBSTR %* 250;
, lengthn(compress(X)) as COMPRESS %* 300;
, lengthn(compbl(X)) as COMPBL %* 300;
, lengthn(reverse(X)) as REVERSE %* 300;
, lengthn(trim(X)) as TRIM %* 300;
, lengthn(translate(X,'a','b')) as TRANSLATE %* 300;
, lengthn(scan(X,1)) as SCAN %* 300;
, lengthn(left(X)) as LEFT %* 300;
, lengthn(put(X,$300.)) as PUT %* 300;
, lengthn(putc(X,'$300.')) as PUTC %* 300;
, lengthn(repeat('X',300)) as REPEAT %* 200;
, lengthn(prxchange('s/a/b/',1,X)) as PRXCHANGE %* 200;
, lengthn(catt('X',X)) as CAT %* 0;
, lengthn(quote(X)) as QUOTE %* 0;
from T;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.