BookmarkSubscribeRSS Feed
therealgtc3
Calcite | Level 5

I am importing an unstructured text data file that exceeds the allowed limit per cell for SAS.

After fudging around a bit, I have been able to import some of the data, but it is not ideal situation and doesn't pull in the entire file it seems.

 

Is there a while, perhaps through the infile statement, that I can say, if the string exceeds 32000 string characters, start variable2 and continue to drop the data into successive variables until all text is stored?

 

example:

data file 
158,000 characters 

 

var1var2var3var4

tons of characters - aopndvikadfngnf;afngaogkma'psvdmax;clvkdasf

gh, q-vw35iv,a -s8eri0t a,zs[0ro86ybuqi4,]-=l "*SIDgkj'pac';ew4l6kgvjw-4n5c 

picksup where var1 left off

- aonwerdvp maq9nyrgha;oriltluakce;sort8ryacfusfe

/;.5gqa7w4pj5ca.seurjcioanuyrec

and so onand so forth... till the end of the text file

 

5 REPLIES 5
Michael_TH
SAS Employee

Example 5 in the following SAS Global Forum paper describes how you can split your variable that is longer than 32K into more than one variable.

 

http://support.sas.com/resources/papers/proceedings11/256-2011.pdf

ballardw
Super User

If by "import" you mean Proc Import or any of the widgets that call proc import then your data is inappropriate for use. Proc Import is intended for structured data in pretty regular terms.

 

A data step can do what you want, I think:

data read;
   infile "some text file name.txt" lrecl=320000 flowover;
   input (v1 - v10)  ($32000.);
run;

the LRECL, logical record length, would attempt to default to something much to short. Flowover on the infile should handle any line break characters. I picked a combination of LRECL and variables lengths so they easily complement each other to evenly divide 10 variables into the record length.

The input with () groups things to repeat the use. So this reads the first 32,000 characters into V1, the second 32,000 into V2 etc.

 

 

Note: some forms of apparently "unstructured" text, such as report files, actually have a complex structure, also not suitable for Proc Import, but can be read in a meaningful way.

Tom
Super User Tom
Super User

Use the $CHAR informat instead to preserve leading spaces at the boundary points.

Tom
Super User Tom
Super User

If it is unstructured then how to know where the "cells" start and end?

Tom
Super User Tom
Super User

If the goal is to read lines of text into multiple shorter character variables then something like this should work.

 

Let's use 10 as the maximum length for one value so our example can be small.

 

First let's make a sample file.

options parmcards=txt;
filename txt temp;
parmcards;
one,two,three
1,2,3
1234567890,12345678901,123456789012
1234567890123456789012345678901234567890
;

Now let's read it in.  We can use the LRECL= to tell SAS that the lines are longer than the default 32K bytes.  Note there is a limit to how large LRECL can be, but it is probably more than 1 million on most installations of SAS. 

 

We can use the LENGTH= and COLUMN= option to determine when we have reached the end of the line.

 

So let's just make one observation per each 10 bytes read.  We can add some additional variables to indicate where the 10 bytes came from.

data tall ;
  infile txt truncover length=ll column=cc lrecl=1000000 ;
  length row col 8 value $10;
  row+1;
  do col=1 by 1 until(cc>ll);
    input value $char10. @ ;
    output;
  end;
run;

Result:

Obs    row    col    value

  1     1      1     one,two,th
  2     1      2     ree
  3     2      1     1,2,3
  4     3      1     1234567890
  5     3      2     ,123456789
  6     3      3     01,1234567
  7     3      4     89012
  8     4      1     1234567890
  9     4      2     1234567890
 10     4      3     1234567890
 11     4      4     1234567890

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 991 views
  • 2 likes
  • 4 in conversation