Solved: Re: Alternative for dlmstr in SAS 9.1.3.

Yura2301 · Posted 08-06-2012 10:31 AM

Hi,

I need export data from text file that can be more-less big(for example 200 kb), delimiter should be some set of characters(for example '<test>').

So if I'll have file with text: "111<test>22222 3333<test>444<test>" result should be one column table with data:

111

22222 3333

4444

I use sas 9.1.3 and in this version dlmstr option isn't presented , so can I somehow optimal read such files and create one column table?

Thanks!

Ksharp · Posted 08-07-2012 06:55 AM

Here is a way.

data x;

infile 'c:\x.txt' recfm=n;

input x $char1. @@;

run;

data temp;

set x ;

if cat(lag6(x),lag5(x),lag4(x),lag3(x),lag2(x),lag1(x))='<test>' then group+1;

run;

proc transpose data=temp out=want(keep=col:) ;

by group;

var x;

run;

data want(keep=want);

set want;

want=tranwrd(cat(of col:),'<test>',' ');

run;

Ksharp

消息编辑者为：xia keshan

View solution in original post

mkeintz · Posted 08-06-2012 10:45 AM

Yura:

If you were on a UNIX system, I would declare a FILENAME statement with a "pipe" parameter that would read this data in through AWK or SED or similar to change all "<text>" to, say "!" (or any other character not in the data). Then you could use "dlm='!'" on an infile statement.

Absent that, try:

data want (keep=field);

   input ;                                                      ** Fill the _INFILE_ automatic var **;
   length text $32767 field $40;
   text=tranwrd(_infile_,"<text>",'!');           ** Make a single-character delimiter **;

do w=1 by 1 while (scan(text,w,'!') ^= ' ');

field=scan(text,w,'!'); ** Use the delimiter with a SCAN function **;
output;

end;

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

FriedEgg · Posted 08-06-2012 12:35 PM

data test;

length x $ 10;

infile cards dlm='2c'x;

input @;

_infile_=prxchange('s/\<test\>/,/',-1,_infile_); *alter the input buffer to change dlmstr to dlm;

input x @@;

if ^missing(x) then output;

cards;

111<test>22222 3333<test>444<test>

;

run;

111

22222 3333

444

Yura2301 · Posted 08-07-2012 04:59 AM

Hi Fried,

Thanks for your answer, but looks like your code also works correct only if file line less than 32767 chars, I actually tried your code and it works ok on small files, but my file is more then 32767 and it already contains '2c'x delimiters inside, so I just use another delimiter that doesn't exists in file, but anyway - on files bigger then 32767 looks like it doesn't work.

Thanks!

Yura2301 · Posted 08-07-2012 03:40 AM

Hi Mkeints,

Thanks, your code works ok, but my file is more less big ~200 kb and all data in file is in one line(one row), so code that you proposed(a little bit changed):

filename _infile_ "&Path\data.txt";

data readFromFile;

infile _infile_ lrecl=32767;

input;

length text $32767 field $32767;

text=tranwrd(_infile_,'<test>,'~');

do w=1 by 1 while(scan(text,w,'~')^='');

field=scan(text,w,'~');

lenf=length(field);

output;

end;

run;

So this code higher can read just first 32767 symbols, if you will put 100000,for example, code throw error.

I can read all file by this code:

data test2;

infile _infile_ dsd lrecl=1000000 pad;

input txt1 : $32767. @@;

row=_n_;

run;

And result table "test2" will have many rows, depends on file size and special symbols in data etc.,

and then I can just work(scan,substr,merge strings etc.) with these "test2" table to achive needed result, but I'm not sure if it optimal solution in my case.

May be there is some option that allow to use sas functions that works with strings that are longer then 32767?

Thanks!

Ksharp · Posted 08-07-2012 06:55 AM

Here is a way.

data x;

infile 'c:\x.txt' recfm=n;

input x $char1. @@;

run;

data temp;

set x ;

if cat(lag6(x),lag5(x),lag4(x),lag3(x),lag2(x),lag1(x))='<test>' then group+1;

run;

proc transpose data=temp out=want(keep=col:) ;

by group;

var x;

run;

data want(keep=want);

set want;

want=tranwrd(cat(of col:),'<test>',' ');

run;

Ksharp

消息编辑者为：xia keshan

Yura2301 · Posted 08-15-2012 04:38 AM

Hi Ksharp,

I caught the idea, I didn't try all your just part of it( till transpose) plus some simple char concatenations so in the end I achive needed goal.

So thanks!

mkeintz · Posted 08-07-2012 09:44 AM

Here's a technique (untested) that might simplify the programming. It's meant to work as long as none of your fields contains a '<' character. The trick here is using the "@ 'est>" pointer control in the INPUT statement.

I've modified this note to account for the fact that the first field in each line is not preceded by '<test>'.

data ;
infile ..... dlm='<' lrecl=1000000 length=len column=col;

/* COL above is the column pointer after the most recent INPUT statement */

length field $200;

input field @;

do while (col<len);

output;

input @ 'est>' field @;

end;

output;

run;

If the infile is a single long line, then you can simplify to

data ;
infile ..... dlm='<' lrecl=1000000 ;

length field $200;

if _n_=1 then input field @@;

else input @ 'est>' field @@;

run;

The first example uses a trailing single "@", telling SAS to release the current input line when the end of the DATA step is encountered (thereby removing the "lost card" message of an earlier version using double "@@"). The second example uses a trailing double "@@" telling sas NOT to drop the input line.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

FriedEgg · Posted 08-13-2012 07:06 PM

Gave this a bit more thought. I'm not incredibly pleased with the following, but it appears to get the job done. I testing with a file of several MB of data all on a single line.

Process flow:

1) Read in a binary stream from the file 'in' 256 bytes at a time.

2) search in a loop for delimited strings and substring them out until reaching the end of stream.

3) concatenate remainder from previous try that did not end in a dlmstr and repeat.

data test;

length infile buffer $ 512;

if _n_=1 then do;

dlmstr='<test>';

_prx=prxparse( '/(' || dlmstr || ')|(.)/' );

retain _prx dlmstr;

call missing(buffer);

end;

else if n>0 then buffer=substr(infile,length(infile)+1-n);

infile in recfm=n lrecl=256;

input infile $256.;

infile=strip(buffer) || infile;

start=1;

stop=length(infile);

n=0;

retain n infile;

call prxnext(_prx,start,stop,infile,pos,len);

do while(pos > 0);

if len=length(dlmstr) then do;

x=substr(infile,pos-n,n);

n=0;

output;

end;

else n++1;

call prxnext(_prx,start,stop,infile,pos,len);

end;

keep x;

run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away