BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
js5
Pyrite | Level 9 js5
Pyrite | Level 9

Hello,

 

I have a .tsv file with variables exceeding 32767 characters. I have managed to read these using input statement and varchar(131071) data type and lrecl=262143, but when I try to split the long variables into several pieces using scan, processing ends at character 32767. Is this supposed to be working?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

I think the problem is actually before that.  It looks like when you INPUT the data, it's only grabbing the first 32,767 characters.

 

I think to make this work, you would need to find an informat to use on the INPUT statement that allows a length >32767.

 

From a quick read of the docs, I don't see any.

 

1    data _null_;
2      infile "Q:\junk\input.txt" truncover dsd delimiter='09'x lrecl=100000;
3      length var1 varchar(70000) var2 $10;
4      input var1 var2;
5
6      l=length(var1) ;
7      put l= ;
8    run;

NOTE: The infile "Q:\junk\input.txt" is:
      Filename=Q:\junk\input.txt,
      RECFM=V,LRECL=100000,File Size (bytes)=60012,
      Last Modified=14Nov2022:11:41:23,
      Create Time=14Nov2022:11:41:23

l=32767
NOTE: 1 record was read from the infile "Q:\junk\input.txt".
      The minimum record length was 60010.
      The maximum record length was 60010.

If you can't find an informat that will read > 32,767 characters, you might have to read it in chunks, as Tom suggested.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

11 REPLIES 11
LinusH
Tourmaline | Level 20

I think it should work, but don't have SAS Viya at my fingertips.

Can you show the log from the import and scan operations?

Data never sleeps
Tom
Super User Tom
Super User

@js5 wrote:

Hello,

 

I have a .tsv file with variables exceeding 32767 characters. I have managed to read these using input statement and varchar(131071) data type and lrecl=262143, but when I try to split the long variables into several pieces using scan, processing ends at character 32767. Is this supposed to be working?

 

 


Please show the code you are using.

A SAS dataset only has two types of variables, floating point numbers and fixed length character strings.  The maximum length of a character string is 32,767 bytes.

 

Are you using something other than a SAS data step to read the file.

 

What is a TSV file?  Do you mean a tab delimited text file?

LinusH
Tourmaline | Level 20
Since OP refers to varchar I guess this is executed in CAS.
Data never sleeps
js5
Pyrite | Level 9 js5
Pyrite | Level 9

.tsv is a tab-delimited file, yes. According to https://luminaredata.com/wp-content/uploads/2018/04/Using_SAS_94M5_to_Manage_Strings_Exceeding_32kb.... there is limited support for varchar in the data step. I am trying to split the string into pieces shorter than 32k before saving it, but it appears not to work.

I cannot share my code 1:1 as the data are proprietary but I will see if I can come up with an example.

Tom
Super User Tom
Super User

You can read a text file with strings that are longer than 32K, you just have to read them into multiple variables (or multiple observations).  You can use the COLUMN= option of the INFILE statement to help you understand the length of the fields on the line.

 

Try something like this in normal SAS code:

data want;
  infile TSV dsd dlm='09'x truncover firstobs=2 column=cc ;
  length id slen 8 str1-str5 $32767 ;
  array str str1-str5;
  input id @ ;
  start=cc;
  input str1 @ ;
  end=cc;
  slen=end-start+1;
  input @start ;
  do index=1 to ceil(slen/32767);
    input str[index] $char32767. ;
    if 32767*(index-1) > slen then do;
      str[index]=substrn(str[index],1,32767*(index-1)-slen);
    end;
  end;
run;
  
Quentin
Super User

I think it should work.

 

I played  bit in Base SAS 9.4M7.  It looks like SCAN worked okay for this test.  But I did have a problem with REPEAT not wanting to write a string longer than 32,767.  Since VARCHAR is new to DATA step (M5 I think), it wouldn't be that surprising if some functions don't handle it correctly yet.

 

1    data _null_ ;
2      length longstr varchar(40000) longstr2 varchar(80000);
3      longstr=repeat('abc ',9999) ;
4      l=length(longstr) ; *I think this should be 40000, but its only 32767 ;
5
6
7      longstr2=trim(longstr)||longstr ;
8
9      longstr2=trim(longstr2)||' def' ;
10
11     l2=length(longstr2) ;
12     lastword=scan(longstr2,-1) ;
13     put l= l2= lastword= ;
14   run;

l=32767 l2=65538 lastword=def
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LinusH
Tourmaline | Level 20

CAS i required if you want to persist VARCHAR variables in your output dataset:

 

Requirement The CAS engine is required if you want to preserve a variable as a VARCHAR data type when reading it in or writing it out using the DATA step.
Data never sleeps
js5
Pyrite | Level 9 js5
Pyrite | Level 9

I do not need the varchar variables in my output dataset, I am dropping them. I only need to read the long strings into varchars so that I can split them into <32k pieces with scan before outputting them to an output dataset.

Quentin
Super User

Can you make a small example, like the data step I posted, that shows the problem?  It looked like scan worked for me.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
js5
Pyrite | Level 9 js5
Pyrite | Level 9

This seems to be sufficient to reproduce the problem:

data varchar;
	infile "C:\users\&sysuserid\Work Folders\Desktop\SAS\varchar\input.txt" truncover dsd delimiter='09'x lrecl=100000;
	length var1 varchar(70000) var2 $10;
	input var1 var2;

	do i = 1 to countw(var1, " ");
		short = scan(var1, i, " ");
		output;
	end;
run;

Neither countw nor scan seem to work beyond 32727 characters.

Quentin
Super User

I think the problem is actually before that.  It looks like when you INPUT the data, it's only grabbing the first 32,767 characters.

 

I think to make this work, you would need to find an informat to use on the INPUT statement that allows a length >32767.

 

From a quick read of the docs, I don't see any.

 

1    data _null_;
2      infile "Q:\junk\input.txt" truncover dsd delimiter='09'x lrecl=100000;
3      length var1 varchar(70000) var2 $10;
4      input var1 var2;
5
6      l=length(var1) ;
7      put l= ;
8    run;

NOTE: The infile "Q:\junk\input.txt" is:
      Filename=Q:\junk\input.txt,
      RECFM=V,LRECL=100000,File Size (bytes)=60012,
      Last Modified=14Nov2022:11:41:23,
      Create Time=14Nov2022:11:41:23

l=32767
NOTE: 1 record was read from the infile "Q:\junk\input.txt".
      The minimum record length was 60010.
      The maximum record length was 60010.

If you can't find an informat that will read > 32,767 characters, you might have to read it in chunks, as Tom suggested.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1114 views
  • 0 likes
  • 4 in conversation