Importing one large text file, exceeds max characters allowed per cell

therealgtc3 · Posted 08-07-2024 09:57 AM

I am importing an unstructured text data file that exceeds the allowed limit per cell for SAS.

After fudging around a bit, I have been able to import some of the data, but it is not ideal situation and doesn't pull in the entire file it seems.

Is there a while, perhaps through the infile statement, that I can say, if the string exceeds 32000 string characters, start variable2 and continue to drop the data into successive variables until all text is stored?

example:

data file

158,000 characters

var1

var2

var3

var4

tons of characters - aopndvikadfngnf;afngaogkma'psvdmax;clvkdasf

gh, q-vw35iv,a -s8eri0t a,zs[0ro86ybuqi4,]-=l "*SIDgkj'pac';ew4l6kgvjw-4n5c

picksup where var1 left off

- aonwerdvp maq9nyrgha;oriltluakce;sort8ryacfusfe

/;.5gqa7w4pj5ca.seurjcioanuyrec

and so on

and so forth... till the end of the text file

Michael_TH · Posted 08-07-2024 10:29 AM

Example 5 in the following SAS Global Forum paper describes how you can split your variable that is longer than 32K into more than one variable.

http://support.sas.com/resources/papers/proceedings11/256-2011.pdf

ballardw · Posted 08-07-2024 10:30 AM

If by "import" you mean Proc Import or any of the widgets that call proc import then your data is inappropriate for use. Proc Import is intended for structured data in pretty regular terms.

A data step can do what you want, I think:

data read;
   infile "some text file name.txt" lrecl=320000 flowover;
   input (v1 - v10)  ($32000.);
run;

the LRECL, logical record length, would attempt to default to something much to short. Flowover on the infile should handle any line break characters. I picked a combination of LRECL and variables lengths so they easily complement each other to evenly divide 10 variables into the record length.

The input with () groups things to repeat the use. So this reads the first 32,000 characters into V1, the second 32,000 into V2 etc.

Note: some forms of apparently "unstructured" text, such as report files, actually have a complex structure, also not suitable for Proc Import, but can be read in a meaningful way.

Tom · Posted 08-07-2024 04:20 PM

Use the $CHAR informat instead to preserve leading spaces at the boundary points.

Tom · Posted 08-07-2024 03:53 PM

If it is unstructured then how to know where the "cells" start and end?

Tom · Posted 08-07-2024 04:27 PM

If the goal is to read lines of text into multiple shorter character variables then something like this should work.

Let's use 10 as the maximum length for one value so our example can be small.

First let's make a sample file.

options parmcards=txt;
filename txt temp;
parmcards;
one,two,three
1,2,3
1234567890,12345678901,123456789012
1234567890123456789012345678901234567890
;

Now let's read it in. We can use the LRECL= to tell SAS that the lines are longer than the default 32K bytes. Note there is a limit to how large LRECL can be, but it is probably more than 1 million on most installations of SAS.

We can use the LENGTH= and COLUMN= option to determine when we have reached the end of the line.

So let's just make one observation per each 10 bytes read. We can add some additional variables to indicate where the 10 bytes came from.

data tall ;
  infile txt truncover length=ll column=cc lrecl=1000000 ;
  length row col 8 value $10;
  row+1;
  do col=1 by 1 until(cc>ll);
    input value $char10. @ ;
    output;
  end;
run;

Result:

Obs    row    col    value

  1     1      1     one,two,th
  2     1      2     ree
  3     2      1     1,2,3
  4     3      1     1234567890
  5     3      2     ,123456789
  6     3      3     01,1234567
  7     3      4     89012
  8     4      1     1234567890
  9     4      2     1234567890
 10     4      3     1234567890
 11     4      4     1234567890

Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Re: Importing one large text file, exceeds max characters allowed per cell

Registration is open

SAS Training: Just a Click Away