I'm trying to import multiple variables are each strings and can be more than 32767 a piece. I know the limit for a string variable is 32767, but I'm hoping to import such that if a variable reaches the max it spills over into another variable. I saw a partial fix in another post, but I can't get it to work. For trying it out, I limited the dataset to only 1 variable. To make it easier I also took the header row out o the dataset. This variable is just called Y. The current form is tab-delimited, but I've also unsuccessfully tried other formats.
Below I tried to import the Y variable as a possible array with a first variable Ypart1 up to 32767 and let it spill over into a Ypart2 if needed. As it is, it just fills up the first Ypart1 to the limit, makes a Ypart2, but no data appears there. I've attached the limited datafile. I can do this per variable, if necessary. Obviously it'd be best to do it all at once. Thanks for any guidance!
filename y_var temp;
data y_test;
infile 'C:\Users\dsm6713\Google Drive\Academic\y_var.tsv' truncover;
array Ypart[2] $32767.;
input (Ypart[*])($char32767.);
run;
Looks like an interesting solution! I tried to adapt it and and while it's parsing the first column it's not moving past that. The delimiter in my data is actually the letter "a". So a string would look like :
947a946a945a944a943a942a941a940a939a938a937a936a933a1029a926a1119a897a875a850a828a811a798a791a786a786a785
But out to hundreds of thousands of characters. So the string between the a's is never longer than 4.
That is a totally different problem than the way you originally presented it. So you no longer have a length issue at all since the maximum length is only 4!
Let's make some simple example data:
filename example temp;
data _null_;
file example;
infile cards truncover;
input line $100. ;
len=lengthn(line);
put line $varying200. len ;
cards;
947a946a945a944a943a942a941a
947a946a945a944a943a942a941a940a939a938a937
936a933a1029a926a1119
;
Now just read it into a tall skinny structure with line number, column number and value.
data want ;
infile example dsd dlm='a' truncover length=line_len column=current_col ;
row+1;
do col=1 by 1 until(current_col > line_len);
input word :$4. @;
output;
end;
run;
Results
Obs row col word 1 1 1 947 2 1 2 946 3 1 3 945 4 1 4 944 5 1 5 943 6 1 6 942 7 1 7 941 8 2 1 947 9 2 2 946 10 2 3 945 11 2 4 944 12 2 5 943 13 2 6 942 14 2 7 941 15 2 8 940 16 2 9 939 17 2 10 938 18 2 11 937 19 3 1 936 20 3 2 933 21 3 3 1029 22 3 4 926 23 3 5 1119
Just to simplify slightly and cut down the possibilities for error, get rid of the array and try:
input Ypart1 $char32767. Ypart2 $char32767.;
Astounding, I just tried that, and same result. It cuts off Ypart1 at 32767 and includes a blank Ypart2 variable. It's not spilling over. The original solution below works, but only as they included the data as parmcards:
filename FT15F001 temp;
data lines;
infile FT15F001 truncover;
array part[3] $40.;
input (part[*])($char40.);
parmcards;
this is the data line this is the line this is the data line
this is the data line this is the data line this is the data line this is the data line
this is the data line this is the data line this is the data line
;;;;
run;
I'm including this as perhaps it will make more sense to you why theirs works.
Like this:
data example; infile datalines truncover; input part1 $40. part2 $40. part3 $40.; datalines; this is the data line this is the line this is the data line this is the data line this is the data line this is the data line this is the data line this is the data line this is the data line this is the data line ;
If your actual data step has anything other than that varying data you need to show the entire data step.
INFILE options have some interesting interactions with the INPUT statement sometimes and the entire syntax is likely needed.
Sorry, but I'm not sure what you mean. I only included that example as it worked for the strings of data that example used. My data is much too large to paste into SAS, so I am trying to import it from a delimited file. My first post is the closest I've gotten, and I included a smaller version of the input data file on that post.
@davidsmarch wrote:
Sorry, but I'm not sure what you mean. I only included that example as it worked for the strings of data that example used. My data is much too large to paste into SAS, so I am trying to import it from a delimited file. My first post is the closest I've gotten, and I included a smaller version of the input data file on that post.
NOT any data that long.
The entire code of the data step. One would hope you are not attempting to do this with datalines or parmcards.
Something that would look similar to
Data mydata;
infile "pathtofile.name" Lrecl=64000 <any otherinfile options> .
input ...
As a minimum your INFILE statement will need an LRECL option to tell SAS that you are attempting to read more than 32K characters on a single line. Otherwise SAS will likely not read past the 32K characters at all.
@davidsmarch wrote:
Sorry, but I'm not sure what you mean. I only included that example as it worked for the strings of data that example used. My data is much too large to paste into SAS, so I am trying to import it from a delimited file. My first post is the closest I've gotten, and I included a smaller version of the input data file on that post.
Two things you need to fix. How many bytes per line you are reading from the file and how you are parsing the "delimited" file.
First your INFILE statement (or the FILENAME statement) needs to set a LRECL that is longer than 32K. Without that nothing will make it to the data step to be read. You could instead try reading it just a stream of characters and figure out where the end of line characters are by your self. There is a max for the LRECL option so if you really have a lot of data per line you might need to use the second option.
Second you cannot really take advantage of the DSD option to process the delimited file since it seems that some of the fields in the file are longer than the max that can fit into a single variable. So you will need make your own algorithm for figuring out where one field ends and the next one begins.
Here is an example where the input line has three variables. I will a use length of just 3 to demonstrate how the "long" variable can be split into multiple variables.
data test;
infile cards dsd dlm='|' truncover ;
length x 8 v1-v5 $3 y 8;
input x @ ;
array v[5] ;
do i=1 to dim(v) until(loc) ;
if i=dim(v) then input v[i] :$char3. @;
else input v[i] $char3. @;
loc = indexc(v[i],'|');
end;
if loc then do;
v[i]=substrn(v[i],1,loc-1);
input +(loc-3) @ ;
end;
input y;
cards;
1|1234567890|9
2|123456|8
;
Results:
Obs x v1 v2 v3 v4 v5 y i loc 1 1 123 456 789 0 9 4 2 2 2 123 456 8 3 1
Looks like an interesting solution! I tried to adapt it and and while it's parsing the first column it's not moving past that. The delimiter in my data is actually the letter "a". So a string would look like :
947a946a945a944a943a942a941a940a939a938a937a936a933a1029a926a1119a897a875a850a828a811a798a791a786a786a785
But out to hundreds of thousands of characters. So the string between the a's is never longer than 4. I like the idea of creating a new variable for each value between the a's. I'd then look to transpose that, but that's a later step.
The way I altered your code is below:
data test;
infile 'C:\Users\dsm6713\Google Drive\Academic\y_var.tsv' dsd dlm='a' truncover ;
length y1-y200 $4;
array y[200];
do i=1 to dim(y) until(loc) ;
if i=dim(y) then input y[i] :$char32767. @;
else input y[i] $char32767. @;
loc = indexc(y[i],'a');
end;
if loc then do;
y[i]=substrn(y[i],1,loc-1);
input +(loc-32767) @ ;
end;
run;
Looks like an interesting solution! I tried to adapt it and and while it's parsing the first column it's not moving past that. The delimiter in my data is actually the letter "a". So a string would look like :
947a946a945a944a943a942a941a940a939a938a937a936a933a1029a926a1119a897a875a850a828a811a798a791a786a786a785
But out to hundreds of thousands of characters. So the string between the a's is never longer than 4.
That is a totally different problem than the way you originally presented it. So you no longer have a length issue at all since the maximum length is only 4!
Let's make some simple example data:
filename example temp;
data _null_;
file example;
infile cards truncover;
input line $100. ;
len=lengthn(line);
put line $varying200. len ;
cards;
947a946a945a944a943a942a941a
947a946a945a944a943a942a941a940a939a938a937
936a933a1029a926a1119
;
Now just read it into a tall skinny structure with line number, column number and value.
data want ;
infile example dsd dlm='a' truncover length=line_len column=current_col ;
row+1;
do col=1 by 1 until(current_col > line_len);
input word :$4. @;
output;
end;
run;
Results
Obs row col word 1 1 1 947 2 1 2 946 3 1 3 945 4 1 4 944 5 1 5 943 6 1 6 942 7 1 7 941 8 2 1 947 9 2 2 946 10 2 3 945 11 2 4 944 12 2 5 943 13 2 6 942 14 2 7 941 15 2 8 940 16 2 9 939 17 2 10 938 18 2 11 937 19 3 1 936 20 3 2 933 21 3 3 1029 22 3 4 926 23 3 5 1119
Apologies. Correct, though the length in each cell is shorter, I saw the process as needing to get the entire string split wide, then transpose to long. But your solution works perfectly!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.