BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ChrisNZ
Tourmaline | Level 20

Hi all,

 

Here is this weird behaviour which I can't explain :

 

When running the second SQL statement, the CLEAN_DATA variable's value (not length) is truncated to $200. I don't see why this should take place, and how to circumvent this trimming of the value.

 

Any explanation/idea?

 

Thank you.

 


data TEST; 
  length RAW_DATA CLEAN_DATA $255;
  RAW_DATA  = collate(33,255) ; 
  CLEAN_DATA= prxchange('s/[[:^graph:]]/ /o',-1, RAW_DATA ) ;
  put RAW_DATA= / CLEAN_DATA=;
run;

proc sql;
  create table CLEAN_GOOD as 
  select RAW_DATA
        ,prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) as CLEAN_DATA length=255
  from TEST;
quit;

proc sql;
  create table CLEAN_BAD as 
  select RAW_DATA
        ,notprint(RAW_DATA) as NP
        ,case when notprint(RAW_DATA) 
           then prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST;
quit;

data _null_;  
  set TEST;  
  A=substr(RAW_DATA  ,160); putlog 'Raw  ' A;  
  A=substr(CLEAN_DATA,160); putlog 'Test ' A;  
  set CLEAN_GOOD;  
  A=substr(CLEAN_DATA,160); putlog 'Good ' A;   
  set CLEAN_BAD;  
  A=substr(CLEAN_DATA,160); putlog 'Bad  ' A; 
run;

Raw  ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Test ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Good ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Bad  ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçè

 

[Edit: fixed a typo]

1 ACCEPTED SOLUTION
7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

A similar behaviour occurs when using other functions, such as catt(), but prxparse() gives no warning and doesn't choke, so I am wondering if anybody else looked at this.

 

Even with catt(), why would a case statement not take the length= option into account? Is this expected behaviour?

 

 


data TEST; 
  length RAW_DATA CLEAN_DATA $255;
  RAW_DATA  = collate(33,255) ; 
  CLEAN_DATA= catt(RAW_DATA, 'd' ) ;
  put RAW_DATA= / CLEAN_DATA=;
run;

proc sql;
  create table CLEAN_GOOD as 
  select RAW_DATA
        ,catt(RAW_DATA, 'd' ) as CLEAN_DATA length=255
  from TEST;
quit;

proc sql;
  create table CLEAN_BAD as 
  select RAW_DATA
        ,notprint(RAW_DATA) as NP
        ,case when notprint(RAW_DATA) 
           then catt(RAW_DATA, 'd' ) 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST;
quit;

data _null_;
  set TEST;
  A=substr(RAW_DATA  ,160); ;putlog 'Raw  ' A; 
  A=substr(CLEAN_DATA,160); ;putlog 'Test ' A; 
  set CLEAN_GOOD;
  A=substr(CLEAN_DATA,160); ;putlog 'Good ' A; 
  set CLEAN_BAD;
  A=substr(CLEAN_DATA,160); ;putlog 'Bad  ' A; 
run;

 

This warning is written only when creatimg table CLEAN_BAD:

WARNING: In a call to the CATT function, the buffer allocated for the result was not long enough to contain the concatenation of all the arguments. The correct result would contain 224 characters, but the actual result might either be truncated to 200
         character(s) or be completely blank, depending on the calling environment. The following note indicates the left-most argument that caused truncation.

Result:

Raw  ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Test ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿd
Good ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿd
Bad   

 

 

 

Ksharp
Super User

Maybe it due to PROC SQL.

Better use the most updated SAS version 9.4m4 .

In old sas, the default length of characher  is 200.

Tom
Super User Tom
Super User

A think that most string functions will limit their results to $200 when called within PROC SQL.

One exception is the SUBSTR() function.

Try this little test .

 

data x ;
   length x $400 ;
   x=repeat('0123456789',39);
run;
proc sql ;
  select length(x)
       , length(repeat('0123456789',39))
       , length(substr(x,1,400))
  from x
  ;
quit;

Try using some other function besides REPEAT() and see if you can get result that is longer than 200 characters.

ChrisNZ
Tourmaline | Level 20

@Tom @Ksharp

Thanks for taking interest.

 

As you can see from the first SQL clause, the function itself does not necessarily trim to $200.

It returns the full result when used with proc SQL when a length is assigned.

 

What does the trimming is calling the function within a case statement.

The length defined after the case test is not passed on to the function. This is most annoying.

 

Either it should be or a syntax such as

proc sql;
  create table CLEAN_BAD as 
  select RAW_DATA
        ,case when notprint(RAW_DATA) 
           then prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) length=255 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST;
quit;

should be supported.

 

I reckon the length defined after case should known by the inner case results. Otherwise what's the point?

What do you think?

 

Tom
Super User Tom
Super User

Well that shows a way you can then work around the issue.  Make a new column and use the CALCULATED keyword to reference it.  You could either use a subquery or use a dataset option to remove the extra column.

proc sql;
  create table CLEAN_GOOD2 (drop=regex) as 
  select RAW_DATA
        ,prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) as regex length=255 
        ,case when notprint(RAW_DATA) 
           then calculated regex 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST
  ;
quit;
ChrisNZ
Tourmaline | Level 20

@Tom

The point of the CASE test is to avoid parsing a RegEx (a very costly operation) when not needed.

I lose this feature when using a calculated column, and might as well just use my first SQL example.

 

It really would make sense that the WHEN entries use the length defined for the CASE statement.

I feel like making a ballot suggestion but I doubt it would make ant difference at all.

More generally, we need a way to use functions in SQL and force them to extend beyond 200.

 

PS Note that most other functions like upcase and coaslesce (but not coalescec)  don't trim at $200.

Recent functions cat and quote misbehave badly.

 


data T;
  length X $300;
  X=repeat('X',300);
proc sql;
  select lengthn(coalesce(' ', X))        as COALESCE   %* 300; 
       , lengthn(coalescec(' ', X))       as COALESCEC  %* 200;
       , lengthn(upcase(X))               as UPCASE     %* 300;
       , lengthn(substr(X,10,250))        as SUBSTR     %* 250;
       , lengthn(compress(X))             as COMPRESS   %* 300;
       , lengthn(compbl(X))               as COMPBL     %* 300;
       , lengthn(reverse(X))              as REVERSE    %* 300;
       , lengthn(trim(X))                 as TRIM       %* 300;
       , lengthn(translate(X,'a','b'))    as TRANSLATE  %* 300;
       , lengthn(scan(X,1))               as SCAN       %* 300;
       , lengthn(left(X))                 as LEFT       %* 300;
       , lengthn(put(X,$300.))            as PUT        %* 300;
       , lengthn(putc(X,'$300.'))         as PUTC       %* 300;
       , lengthn(repeat('X',300))         as REPEAT     %* 200;
       , lengthn(prxchange('s/a/b/',1,X)) as PRXCHANGE  %* 200;
       , lengthn(catt('X',X))             as CAT        %*   0;
       , lengthn(quote(X))                as QUOTE      %*   0;
  from T;
quit;

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 4613 views
  • 0 likes
  • 3 in conversation