DATA Step, Macro, Functions and more

Case statement trims length to 200?

Accepted Solution Solved
Reply
PROC Star
Posts: 1,564
Accepted Solution

Case statement trims length to 200?

[ Edited ]

Hi all,

 

Here is this weird behaviour which I can't explain :

 

When running the second SQL statement, the CLEAN_DATA variable's value (not length) is truncated to $200. I don't see why this should take place, and how to circumvent this trimming of the value.

 

Any explanation/idea?

 

Thank you.

 


data TEST; 
  length RAW_DATA CLEAN_DATA $255;
  RAW_DATA  = collate(33,255) ; 
  CLEAN_DATA= prxchange('s/[[:^graph:]]/ /o',-1, RAW_DATA ) ;
  put RAW_DATA= / CLEAN_DATA=;
run;

proc sql;
  create table CLEAN_GOOD as 
  select RAW_DATA
        ,prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) as CLEAN_DATA length=255
  from TEST;
quit;

proc sql;
  create table CLEAN_BAD as 
  select RAW_DATA
        ,notprint(RAW_DATA) as NP
        ,case when notprint(RAW_DATA) 
           then prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST;
quit;

data _null_;  
  set TEST;  
  A=substr(RAW_DATA  ,160); putlog 'Raw  ' A;  
  A=substr(CLEAN_DATA,160); putlog 'Test ' A;  
  set CLEAN_GOOD;  
  A=substr(CLEAN_DATA,160); putlog 'Good ' A;   
  set CLEAN_BAD;  
  A=substr(CLEAN_DATA,160); putlog 'Bad  ' A; 
run;

Raw  ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Test ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Good ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Bad  ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçè

 

[Edit: fixed a typo]


Accepted Solutions
Solution
‎03-04-2017 03:47 AM

All Replies
PROC Star
Posts: 1,564

Re: Case statement trims length to 200?

[ Edited ]

A similar behaviour occurs when using other functions, such as catt(), but prxparse() gives no warning and doesn't choke, so I am wondering if anybody else looked at this.

 

Even with catt(), why would a case statement not take the length= option into account? Is this expected behaviour?

 

 


data TEST; 
  length RAW_DATA CLEAN_DATA $255;
  RAW_DATA  = collate(33,255) ; 
  CLEAN_DATA= catt(RAW_DATA, 'd' ) ;
  put RAW_DATA= / CLEAN_DATA=;
run;

proc sql;
  create table CLEAN_GOOD as 
  select RAW_DATA
        ,catt(RAW_DATA, 'd' ) as CLEAN_DATA length=255
  from TEST;
quit;

proc sql;
  create table CLEAN_BAD as 
  select RAW_DATA
        ,notprint(RAW_DATA) as NP
        ,case when notprint(RAW_DATA) 
           then catt(RAW_DATA, 'd' ) 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST;
quit;

data _null_;
  set TEST;
  A=substr(RAW_DATA  ,160); ;putlog 'Raw  ' A; 
  A=substr(CLEAN_DATA,160); ;putlog 'Test ' A; 
  set CLEAN_GOOD;
  A=substr(CLEAN_DATA,160); ;putlog 'Good ' A; 
  set CLEAN_BAD;
  A=substr(CLEAN_DATA,160); ;putlog 'Bad  ' A; 
run;

 

This warning is written only when creatimg table CLEAN_BAD:

WARNING: In a call to the CATT function, the buffer allocated for the result was not long enough to contain the concatenation of all the arguments. The correct result would contain 224 characters, but the actual result might either be truncated to 200
         character(s) or be completely blank, depending on the calling environment. The following note indicates the left-most argument that caused truncation.

Result:

Raw  ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
Test ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿd
Good ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿd
Bad   

 

 

 

Super User
Posts: 9,687

Re: Case statement trims length to 200?

[ Edited ]

Maybe it due to PROC SQL.

Better use the most updated SAS version 9.4m4 .

In old sas, the default length of characher  is 200.

Super User
Super User
Posts: 6,502

Re: Case statement trims length to 200?

A think that most string functions will limit their results to $200 when called within PROC SQL.

One exception is the SUBSTR() function.

Try this little test .

 

data x ;
   length x $400 ;
   x=repeat('0123456789',39);
run;
proc sql ;
  select length(x)
       , length(repeat('0123456789',39))
       , length(substr(x,1,400))
  from x
  ;
quit;

Try using some other function besides REPEAT() and see if you can get result that is longer than 200 characters.

PROC Star
Posts: 1,564

Re: Case statement trims length to 200?

@Tom @Ksharp

Thanks for taking interest.

 

As you can see from the first SQL clause, the function itself does not necessarily trim to $200.

It returns the full result when used with proc SQL when a length is assigned.

 

What does the trimming is calling the function within a case statement.

The length defined after the case test is not passed on to the function. This is most annoying.

 

Either it should be or a syntax such as

proc sql;
  create table CLEAN_BAD as 
  select RAW_DATA
        ,case when notprint(RAW_DATA) 
           then prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) length=255 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST;
quit;

should be supported.

 

I reckon the length defined after case should known by the inner case results. Otherwise what's the point?

What do you think?

 

Super User
Super User
Posts: 6,502

Re: Case statement trims length to 200?

Well that shows a way you can then work around the issue.  Make a new column and use the CALCULATED keyword to reference it.  You could either use a subquery or use a dataset option to remove the extra column.

proc sql;
  create table CLEAN_GOOD2 (drop=regex) as 
  select RAW_DATA
        ,prxchange('s/[[:^graph:]]/ /o',-1,RAW_DATA) as regex length=255 
        ,case when notprint(RAW_DATA) 
           then calculated regex 
           else RAW_DATA 
         end as CLEAN_DATA length=255 
  from TEST
  ;
quit;
PROC Star
Posts: 1,564

Re: Case statement trims length to 200?

[ Edited ]

@Tom

The point of the CASE test is to avoid parsing a RegEx (a very costly operation) when not needed.

I lose this feature when using a calculated column, and might as well just use my first SQL example.

 

It really would make sense that the WHEN entries use the length defined for the CASE statement.

I feel like making a ballot suggestion but I doubt it would make ant difference at all.

More generally, we need a way to use functions in SQL and force them to extend beyond 200.

 

PS Note that most other functions like upcase and coaslesce (but not coalescec)  don't trim at $200.

Recent functions cat and quote misbehave badly.

 


data T;
  length X $300;
  X=repeat('X',300);
proc sql;
  select lengthn(coalesce(' ', X))        as COALESCE   %* 300; 
       , lengthn(coalescec(' ', X))       as COALESCEC  %* 200;
       , lengthn(upcase(X))               as UPCASE     %* 300;
       , lengthn(substr(X,10,250))        as SUBSTR     %* 250;
       , lengthn(compress(X))             as COMPRESS   %* 300;
       , lengthn(compbl(X))               as COMPBL     %* 300;
       , lengthn(reverse(X))              as REVERSE    %* 300;
       , lengthn(trim(X))                 as TRIM       %* 300;
       , lengthn(translate(X,'a','b'))    as TRANSLATE  %* 300;
       , lengthn(scan(X,1))               as SCAN       %* 300;
       , lengthn(left(X))                 as LEFT       %* 300;
       , lengthn(put(X,$300.))            as PUT        %* 300;
       , lengthn(putc(X,'$300.'))         as PUTC       %* 300;
       , lengthn(repeat('X',300))         as REPEAT     %* 200;
       , lengthn(prxchange('s/a/b/',1,X)) as PRXCHANGE  %* 200;
       , lengthn(catt('X',X))             as CAT        %*   0;
       , lengthn(quote(X))                as QUOTE      %*   0;
  from T;
quit;

 

Solution
‎03-04-2017 03:47 AM
PROC Star
Posts: 1,564
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 225 views
  • 0 likes
  • 3 in conversation