BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
asasgdsfa
Obsidian | Level 7

Hi

 

I need to use a string above 32767 characters. Logic is more complicated, but here is and example:

  1. A data step has two variables below the limit as inputs.
  2. The variables need to be concated.
  3. The previous step is done to scan (and similar functions such as tranwrd) for the whole string. It is necessary because, e.g., a relevant substring may be divided, having one part at the end of one variable and the second part at the beginning of the second source.
  4. I do NOT try to save the string above 32767, just to use it in a single non-CAS data step. You can imagine that the scan function in the example returns a string much shorter (the example should produce a length of "2+length(varB)" because " aabbbBB..." is in the middle).

I'm aware that the current example could be handled in a different way, such "find the end of A" and add "B". However, this is just due to simplification and I really need the whole concated string.

 

Not working:

 

data test;
    length var1 var2 $32000;
    var1 = repeat('A', 31000);
    var1 = catt(var1," aa");
    var2 = repeat('B', 31000);
    var2 = catt("bbb", var2);
    output;
run;

data test;
	set test;
	length v varchar(70000);
	v=catt(var1, var2);
/* 	put v; */
	l=length(v);
	put l;
	
	length v2 varchar(70000);
	v2=scan(v, 2 ,' ');
/* 	put v2; */
	l2=length(v2);
	put l2;

run;

 

 

Working (the same with shorter variables):

 

data test;
    length var1 var2 $32000;
    var1 = repeat('A', 1000);
    var1 = catt(var1," aa");
    var2 = repeat('B', 1000);
    var2 = catt("bbb", var2);
    output;
run;

data test;
	set test;
	length v varchar(70000);
	v=catt(var1, var2);
/* 	put v; */
	l=length(v);
	put l;
	
	length v2 varchar(70000);
	v2=scan(v, 2 ,' ');
/* 	put v2; */
	l2=length(v2);
	put l2;

run;

 

 

For the not working code, not even lengths are printed correctly (both outputs results in 1). The warrning from the log:

 NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with 
       options MSGLEVEL=I.
 WARNING: In a call to the CATT function, the buffer allocated for the result was not long enough to contain the concatenation of 
          all the arguments. The correct result would contain 62008 characters, but the actual result might either be truncated to 
          32767 character(s) or be completely blank, depending on the calling environment. The following note indicates the 
          left-most argument that caused truncation.
 NOTE: Argument 2 to function CATT('AAAAAAAAAAAA'[12 of 32000 characters shown],'bbbBBBBBBBBB'[12 of 32000 characters shown]) at 
       line 85 column 4 is invalid.
var1=AAAAAAAA ......

[The whole A and B variables are printed here.]

.... BBBBBBB v= l=1 v2=
 l2=1 _ERROR_=1 _N_=1

Any suggestions how to overcome it, i.e., how to get a long varchar from two smaller variables and use it in the scan function?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So TRY IT.

Looks like you cannot use CAT... functions to build the varchar value.  But SUBSTR() works.

4435  data test;
4436    length var1 var2 $32000 v varchar(70000);
4437    drop v;
4438    var1=repeat('1234567890',3199);
4439    var2=repeat('0987654321',3199);
4440    v=var1;
4441    substr(v,32001,32000)=var2;
4442    l1=length(var1);
4443    l2=length(var2);
4444    l3=length(v);
4445    loc=index(v,'0987');
4446    put (l:) (=);
4447    format v: $20.;
4448  run;

l1=32000 l2=32000 l3=64000 loc=32001
NOTE: The data set WORK.TEST has 1 observations and 6 variables.

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Are you using plain old SAS?  If so you cannot have a variable that is longer than 32,767 bytes.

 

What is the maximum substring you are looking to find and replace?  If that is short enough you can basically do it in steps.

Say the maximum length of the string you want to find and the length of the string you are replacing it with is 10 then you need to take a window of at least 2*10 -> 20 characters and slide it over your string and replace in that window.  The window will need to consist of two substrings of length 10 so you can move over the longer string in increments of 10.

 

Consider this line:

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
01        The quick brown fox jumped over the lazy dog.

So you pull out 'The quick ' and 'brown fox ' and stick them together to get 'The quick brown fox '.  Now you can find 'quick brown' and change it to something else.  So you can write out the first part.  Keep the second part (remember the length is now has if different than the original) and appended the third piece.  Repeat for all of the pieces.

Tom
Super User Tom
Super User

An example program that uses this method when reading from a text used to be available on SAS website.  This post processing macro that replaces a constant string with a PAGE X of Y string.

pageofpp.sas

Spoiler
%macro pageofpp
/*------------------------------------------------------------------------------
Post-processor to handle internal pagination of SAS output
------------------------------------------------------------------------------*/
(infile=      /* Input file (fileref or quoted filespec) (REQ) */
,outfile=     /* Output file (fileref or quoted filespec) (REQ) */
              /* Input and output must be two different files.  */
,text=        /* String to find and replace in input file (REQ) */
 [Page_00000_of_99999]
,type=        /* File format type (TXT LIS LST PS PDF RTF). If not specified  */
              /* then will default to the extension of the output file. */
,justify=L    /* Justification of generated text. Ignored when type=RTF.    */
              /* L=Left R=Right C=Center. */
);
/*------------------------------------------------------------------------------
SAS Macro Module
$Source: /cdo/Razor_db/Main/RAZOR_UNIVERSE/DOMAIN_01/global/Archive/RZ_VCS/gmac/pageofpp.sas,v $
--------------------------------------------------------------------------------
Revision information:
$Revision: 1.1 $ $Date: 2002/07/09 19:46:04 $
$Author: abernt $ $Locker:  $
--------------------------------------------------------------------------------
$Purpose:
Read in an output file and replace place holder text with actual text for
Page X of Y internal pagination of SAS output.

$Assumptions:
- The special text occurs only once per page and all on one line.

$Inputs: file with location of Page X of Y marked by special text
$Outputs: updated file with internal pagination
$Calls to: parmv.sas getphys.sas

--------------------------------------------------------------------------------
$Usage notes:
Place the TEXT value on the page where you want the generated Page X of Y string
to be placed.  This would normally be in a TITLE or FOOTNOTE statement.

The length of the special text must be long enough to hold the generated
Page X of Y text. (Does not apply to RTF files.)

For RTF files the internal pagination is not calculated. Instead RTF commands to
display page number and number of document pages are used.

Use values for TEXT that do not include spaces.  This will help insure that they
are not split into separate lines in RTF or PDF files by SAS.

* Example 1 - Left Justified in a PDF file ;
%let pagnumt=[Page_00000_of_99999];
title1 "Sertraline Protocol A0501001";
footnote1 "&pagenumt";
...
%pageofpp(infile=out('test1.pdf'),outfile=out('test2.pdf'),text=&pagnumt)

* Example 2 - Centered in a TXT file ;
%let pagnumt=[Page_00000_of_99999];
title1 "Sertraline Protocol A0501001";
footnote1 "%sysfunc(repeat(%str( ),50))&pagenumt";
...
%pageofpp(infile=out('test1.txt'),outfile=out('test2.txt'),text=&pagnumt
         ,justify=c)

--------------------------------------------------------------------------------
Modification History
--------------------------------------------------------------------------------
Revision n.x  yyyy/mm/dd hh:mm:ss  username
$Log: pageofpp.sas,v $
Revision 1.1  2002/07/09 19:46:04  abernt
Initial revision

Revision 0.0  2002/04/16           abernt   Initial version

------------------------------------------------------------------------------*/
%local macro parmerr
       infilen outfilen
       numpages numpagel
       slength rtfcommand lpageof
;
%let macro=PAGEOFPP;
%*----------------------------------------------------------------------
Validate parameters
-----------------------------------------------------------------------;
%parmv(INFILE,_req=1,_case=n)
%if (%length(&infile)) %then %do;
  %getphys(&infile,mode=I,mvar=infilen)
  %if (&sysrc) %then
     %parmv(INFILE,_msg=Cannot open &infilen)
  ;
%end;

%parmv(OUTFILE,_req=1,_case=n)
%if (%length(&outfile)) %then %do;
  %getphys(&outfile,mode=U,mvar=outfilen)
  %if (&sysrc=1) %then
     %parmv(OUTFILE,_msg=Cannot write to &outfilen)
  ;
%end;
%if (&infilen = &outfilen) %then
  %parmv(OUTFILE,_msg=Cannot write over input file)
;

%parmv(TEXT,_req=1,_words=1,_case=n)
%parmv(JUSTIFY,_req=1,_val=L R C,_case=U)

%if (^%length(&type)) %then %let type=%scan(&outfilen,-1,%str(.%"));
%parmv(TYPE,_req=1,_val=TXT PDF RTF LIS LST PS,_case=U)


%if (&parmerr) %then %goto quit;

%*-----------------------------------------------------------------------------
Start body of macro
-------------------------------------------------------------------------------;
%put ;
%put SPE NOTE: Starting &macro.. INFILE=&infile OUTFILE=&outfile TYPE=&type
 JUSTIFY=&justify TEXT="&text"
;
%put;

%*------------------------------------------------------------------------------
Recode TYPE because some are treated exactly the same way.
-------------------------------------------------------------------------------;
%if (&type = LIS) or (&type = LST) %then %let type=TXT;
%else %if (&type = PS) %then %let type=PDF;

%*------------------------------------------------------------------------------
RTFCOMMAND = Command string in RTF that tells WORD to display Page X of Y.
-------------------------------------------------------------------------------;
%let rtfcommand={Page }{\field{\*\fldinst {PAGE }}}{ of };
%let rtfcommand=&rtfcommand.{\field{\*\fldinst {NUMPAGES }}};

%*------------------------------------------------------------------------------
SLENGTH = length of search string to use in later calculations.
LPAGENO = length of variable to hold generated Page X of Y string.
-------------------------------------------------------------------------------;
%let slength=%length(&text);
%if (&type = RTF) %then %let lpageof=%length(&rtfcommand);
%else %let lpageof=&slength;

%if (&type ^= RTF) %then %do;
*------------------------------------------------------------------------------;
* Scan file to determine the number of pages ;
*------------------------------------------------------------------------------;
  data _null_;
    infile &infile lrecl=32767 end=_eof;
    input;
    if index(_infile_,"&text") then pageno + 1;
    if _eof then call symput('NUMPAGES',compress(put(pageno,14.)));
  run;
%------------------------------------------------------------------------------
Get length of numpages string to set format for generating page numbers.
Check that generated text will fit without space reserved by placeholder text.
Note:  9=length(PAGE__OF_).
-------------------------------------------------------------------------------;
  %let numpagel=%length(&numpages);
  %if (%length(&text) < %eval(9 + 2*&numpagel)) %then %do;
    %parmv(TEXT,_msg=NUMPAGES=&numpages will generate too long a string)
    %goto quit;
  %end;
%end;

*----------------------------------------------------------------------;
* Process file and replace place holder text with actual pagination ;
*----------------------------------------------------------------------;
data _null_;
  infile &infile lrecl=32767 end=_eof;
  length pageof $ &lpageof;
  input ;
  if index(_infile_,"&text") then do;
%if (&type = RTF) %then %do;
*----------------------------------------------------------------------;
* Type = RTF.  Set pageof string to rtf command. ;
*----------------------------------------------------------------------;
    pageof = "&rtfcommand";
%end;
%else %do;
*----------------------------------------------------------------------;
* Type ^= RTF.  Count pages and generate Page x of y string. ;
*----------------------------------------------------------------------;
    pageno + 1;
    pageof = 'Page ' || put(pageno,&numpagel..) || " of &numpages";
  %if (&type = PDF) %then %do;
*----------------------------------------------------------------------;
* Type = PDF. Compress out multiple blanks in generated text.;
*----------------------------------------------------------------------;
    pageof = compbl(pageof);
  %end;
  %if (&justify = R) %then %do;
*----------------------------------------------------------------------;
* Justify = R. Push generated text to right of field.;
*----------------------------------------------------------------------;
    pageof = right(pageof);
  %end;
  %else %if (&justify = C) %then %do;
*----------------------------------------------------------------------;
* Justify = C. Shift right by half of difference in lengths. ;
*----------------------------------------------------------------------;
    len = length(pageof);
    if ( (len+1) < &slength ) then do;
      pageof = repeat(' ',int((&slength - len)/2) -1) || pageof;
    end;
  %end;
%*----------------------------------------------------------------------
Justify = L. Do nothing as generated text is already on left.
-----------------------------------------------------------------------;
%end;
*----------------------------------------------------------------------;
* Replace the placeholder text with the generated text ;
*----------------------------------------------------------------------;
    _infile_ = tranwrd(_infile_,"&text",pageof);
  end;

  file &outfile noprint lrecl=32767;
  put _infile_;
run;

%quit:
%mend pageofpp;

 

asasgdsfa
Obsidian | Level 7
Thank you, but I don't follow. How could it be useful for my case? Do you suggest to output variables to a temporary file and do the processing there?
Tom
Super User Tom
Super User

Use the same ALGORITHM.  Adopt it for your situation.  Instead of reading X bytes from a file read X bytes from one of you characters variables.  You will need to maintain you own POINTER to where to read the next X bytes.

So something like this:

array strings A B ;
index=1;
position=1;
length=10;

left = substrn(strings[index],position,length);
right = substrn(strings[index,position+length,length);
both = cat(left,right);
both = tranwrd(both,"aabb","xxyy");

Now you can write the left helf of BOTH to your target output.  Move the right half to the left half. transfer the next LENGTH characters into RIGHT and repeat until you have processed both strings.

 

But I am not sure how the OUTPUT can be less than 32,767 if the INPUT is longer than that.

asasgdsfa
Obsidian | Level 7

I'm sorry, I'm still not sure that I understand what you mean. Your algorithm would work similarly to the first suggested, i.e., it would take strings part by part, wouldn't it? I can't do it, because, as I wrote, I need to deal with the whole string at once. (And I'm afraid that scan and tranwrd functions can't handle input/output above 32767 characters.)

 

To get aligned, from your example, I'd need something like the following code, which 

data test;
    length A B $32000;
	A = repeat('A', 31000);
    A = catt(A," aa");
    B = repeat('B', 31000);
    B = catt("bbb", B);
    output;
run;

data test1;
	set test;

	array strings A B ;
	length=10;
	left = substrn(strings[1],1,31002);
	right = substrn(strings[1],31003,62005);
	both = substr(tranwrd(catt(left,right),"aabb","xxyy"), 30000, 5000);
	l=length(both);
run;

If you suggest using arrays, I've already tried a similar modification:

data test;
    length A B $32000;
    A = repeat('A', 31000);
    A = catt(A," aa");
    B = repeat('B', 31000);
    B = catt("bbb", B);
    output;
run;

data test2;
	set test;

	array strings A B ;
	length=10;
	both = substr(tranwrd(strings,"aabb","xxyy"), 30000, 5000);
	l=length(both);
run;

 

You mentioned that you are not sure how to get input or output longer and shorter. The two input strings are below the limit, processing part is above it, result is again below the limit. It's similar to:

tmp1 = catt(A, B); /* Very long string */
tmp2 = tranwrd(tmp1,"aabb","xxyy"); /* Processing */
out = substr(tmp2, 30000, 5000); /* The output has 5000 characters */

 

Edit: I need to work with the whole string, because, e.g., it is necessary to find substring such as

scan(tmp, index, '()');

Here, index may dynamically vary, as well as lengths of both tmp and output. The same holds for tranwrd arguments.

 

 

Tom
Super User Tom
Super User

Since you can't use strings longer than 32,767 in data step you cannot apply SCAN() or TRANWRD() to a string longer than that.

 

So if you need to the apply the equivalent of SCAN() or TRANWRD() on your whole string in a data step you will have to find an algorithm to implement the equivalent functionality instead.  The example I showed is for string replacement.  The ancient macro I posted implements a simplified version of TRANWRD() where the source and replacement strings are of the same length.  If you need to do the equivalent of SCAN() you will need a similar divide and conquer approach.

 

Perhaps you can do it with PROC DS2 as that will allow you to construct a VARCHAR varaible that could have longer strings.

 

asasgdsfa
Obsidian | Level 7

The varchar can be defined in both data step and ds2, so I considered them to be equivalent. Are they not? Or is the issue with the functions themselves?

 

Ok, if I used the ds2 with the varchar,  should and approach approach similar to this (for replacing scan and tranwrd) work? I'm just asking about your insight before I put my effort in that.

Tom
Super User Tom
Super User

So TRY IT.

Looks like you cannot use CAT... functions to build the varchar value.  But SUBSTR() works.

4435  data test;
4436    length var1 var2 $32000 v varchar(70000);
4437    drop v;
4438    var1=repeat('1234567890',3199);
4439    var2=repeat('0987654321',3199);
4440    v=var1;
4441    substr(v,32001,32000)=var2;
4442    l1=length(var1);
4443    l2=length(var2);
4444    l3=length(v);
4445    loc=index(v,'0987');
4446    put (l:) (=);
4447    format v: $20.;
4448  run;

l1=32000 l2=32000 l3=64000 loc=32001
NOTE: The data set WORK.TEST has 1 observations and 6 variables.
asasgdsfa
Obsidian | Level 7
Thanks a lot!
asasgdsfa
Obsidian | Level 7

Hi Tom

 

Yes, I'm using 9.4 (not Viya).

 

I know that I can't output variables above the limit, but what I understood, I can use varchar within a single data step and do the manipulation there.

 

There is a possibility that some strings are above 32767. Specifically, an output from the scan can be above this size but then tranwrd and substr shorten it below the limit. Therefore, the final variable is not exceeding 32767 characters, but the scanned output can, so the step-by-step method is not the option.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 814 views
  • 1 like
  • 2 in conversation