Help using Base SAS procedures

Alternative for dlmstr in SAS 9.1.3.

Accepted Solution Solved
Reply
Regular Contributor
Posts: 160
Accepted Solution

Alternative for dlmstr in SAS 9.1.3.

Hi,

I need export data from text file that can be more-less big(for example 200 kb), delimiter should be some set of characters(for example '<test>').

So if I'll have file with text: "111<test>22222 3333<test>444<test>" result should be one column table with data:

111

22222 3333

4444

I use sas 9.1.3 and in this version dlmstr option isn't presented , so can I somehow optimal read such files and create one column table?

Thanks!


Accepted Solutions
Solution
‎08-07-2012 06:55 AM
Super User
Posts: 9,681

Re: Alternative for dlmstr in SAS 9.1.3.

Here is a way.

data x;

infile 'c:\x.txt' recfm=n;

input x $char1. @@;

run;

data temp;

set x ;

if cat(lag6(x),lag5(x),lag4(x),lag3(x),lag2(x),lag1(x))='<test>' then group+1;

run;

proc transpose data=temp out=want(keep=colSmiley Happy ;

by group;

var x;

run;

data want(keep=want);

set want;

want=tranwrd(cat(of colSmiley Happy,'<test>',' ');

run;

Ksharp

消息编辑者为:xia keshan

View solution in original post


All Replies
Valued Guide
Posts: 797

Re: Alternative for dlmstr in SAS 9.1.3.

Yura:

If you were on a UNIX system, I would declare a FILENAME statement with a "pipe" parameter that would read this data in through AWK or SED or similar to change all "<text>" to, say "!" (or any other character not in the data).   Then you could use "dlm='!'" on an infile statement.

Absent that, try:

data want (keep=field);

   input ;                                                      ** Fill the _INFILE_ automatic var **;
   length text $32767  field $40;
   text=tranwrd(_infile_,"<text>",'!');           ** Make a single-character delimiter **;

  do w=1 by 1 while (scan(text,w,'!') ^= ' ');

     field=scan(text,w,'!');                        ** Use the delimiter with a SCAN function **;
     output;

  end;

run;

Trusted Advisor
Posts: 1,300

Re: Alternative for dlmstr in SAS 9.1.3.

data test;

length x $ 10;

infile cards dlm='2c'x;

input @;

_infile_=prxchange('s/\<test\>/,/',-1,_infile_); *alter the input buffer to change dlmstr to dlm;

input x @@;

if ^missing(x) then output;

cards;

111<test>22222 3333<test>444<test>

;

run;

111

22222 3333

444

Regular Contributor
Posts: 160

Re: Alternative for dlmstr in SAS 9.1.3.

Hi Fried,

Thanks for your answer, but looks like your code also works correct only if file line less than 32767 chars, I actually tried your code and it works ok on small files, but my file is more then 32767 and it already contains '2c'x delimiters inside, so I just use another delimiter that doesn't exists in file, but anyway - on files bigger then 32767 looks like it doesn't work.

Thanks!

Regular Contributor
Posts: 160

Re: Alternative for dlmstr in SAS 9.1.3.

filename _infile_ "&Path\data.txt";

data readFromFile;

      infile _infile_ lrecl=32767;

      input;

      length text $32767 field $32767;

      text=tranwrd(_infile_,'<test>,'~');

      do w=1 by 1 while(scan(text,w,'~')^='');

            field=scan(text,w,'~');

            lenf=length(field);

            output;

      end;

run;

      data test2;

            infile _infile_ dsd lrecl=1000000 pad;

            input txt1 : $32767. @@;

            row=_n_;

      run;

And result table "test2" will have many rows, depends on file size and special symbols in data etc.,

and then I can just work(scan,substr,merge strings etc.) with these "test2" table to achive needed result, but I'm not sure if it optimal solution in my case.

May be there is some option that allow to use sas functions that works with strings that are longer then 32767?


Thanks!


Solution
‎08-07-2012 06:55 AM
Super User
Posts: 9,681

Re: Alternative for dlmstr in SAS 9.1.3.

Here is a way.

data x;

infile 'c:\x.txt' recfm=n;

input x $char1. @@;

run;

data temp;

set x ;

if cat(lag6(x),lag5(x),lag4(x),lag3(x),lag2(x),lag1(x))='<test>' then group+1;

run;

proc transpose data=temp out=want(keep=colSmiley Happy ;

by group;

var x;

run;

data want(keep=want);

set want;

want=tranwrd(cat(of colSmiley Happy,'<test>',' ');

run;

Ksharp

消息编辑者为:xia keshan

Regular Contributor
Posts: 160

Re: Alternative for dlmstr in SAS 9.1.3.

Hi Ksharp,

I caught the idea, I didn't try all your just part of it( till transpose) plus some simple char concatenations  so in the end I achive needed goal.

So thanks!

Valued Guide
Posts: 797

Re: Alternative for dlmstr in SAS 9.1.3.

Here's a technique (untested) that might simplify the programming.  It's meant to work as long as none of your fields contains a '<' character.  The trick here is using the "@ 'est>" pointer control in the INPUT statement.


I've modified this note to account for the fact that the first field in each line is not preceded by '<test>'.

data ;
  infile ..... dlm='<'  lrecl=1000000  length=len column=col;

  /* COL above is the column pointer after the most recent INPUT statement */

  length field $200;

  input field @;

  do while (col<len); 

    output;

    input @ 'est>' field @;

  end;

  output;

run;

If the infile is a single long line, then you can simplify to

data ;
  infile ..... dlm='<'  lrecl=1000000  ;

  length field $200;

  if _n_=1 then input field @@;

  else input @ 'est>' field @@;

run;

The first example uses a trailing single "@", telling SAS to release the current input line when the end of the DATA step is encountered (thereby removing the "lost card" message of an earlier version using double "@@").  The second example uses a trailing double "@@" telling sas NOT to drop the input line.

Trusted Advisor
Posts: 1,300

Re: Alternative for dlmstr in SAS 9.1.3.

Gave this a bit more thought.  I'm not incredibly pleased with the following, but it appears to get the job done.  I testing with a file of several MB of data all on a single line.

Process flow:

1) Read in a binary stream from the file 'in' 256 bytes at a time.

2) search in a loop for delimited strings and substring them out until reaching the end of stream.

3) concatenate remainder from previous try that did not end in a dlmstr and repeat.

data test;

length infile buffer $ 512;

if _n_=1 then do;

  dlmstr='<test>';

  _prx=prxparse( '/(' || dlmstr || ')|(.)/' );

  retain _prx dlmstr;

  call missing(buffer);

end;

else if n>0 then buffer=substr(infile,length(infile)+1-n);

infile in recfm=n lrecl=256;

input infile $256.;

infile=strip(buffer) || infile;

start=1;

stop=length(infile);

n=0;

retain n infile;

call prxnext(_prx,start,stop,infile,pos,len);

do while(pos > 0);

  if len=length(dlmstr) then do;

   x=substr(infile,pos-n,n);

   n=0;

   output;

  end;

  else n++1;

  call prxnext(_prx,start,stop,infile,pos,len);

end;

keep x;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 992 views
  • 6 likes
  • 4 in conversation