Hi,
I need export data from text file that can be more-less big(for example 200 kb), delimiter should be some set of characters(for example '<test>').
So if I'll have file with text: "111<test>22222 3333<test>444<test>" result should be one column table with data:
111
22222 3333
4444
I use sas 9.1.3 and in this version dlmstr option isn't presented , so can I somehow optimal read such files and create one column table?
Thanks!
Here is a way.
data x;
infile 'c:\x.txt' recfm=n;
input x $char1. @@;
run;
data temp;
set x ;
if cat(lag6(x),lag5(x),lag4(x),lag3(x),lag2(x),lag1(x))='<test>' then group+1;
run;
proc transpose data=temp out=want(keep=col:) ;
by group;
var x;
run;
data want(keep=want);
set want;
want=tranwrd(cat(of col:),'<test>',' ');
run;
Ksharp
消息编辑者为:xia keshan
Yura:
If you were on a UNIX system, I would declare a FILENAME statement with a "pipe" parameter that would read this data in through AWK or SED or similar to change all "<text>" to, say "!" (or any other character not in the data). Then you could use "dlm='!'" on an infile statement.
Absent that, try:
data want (keep=field);
input ; ** Fill the _INFILE_ automatic var **;
length text $32767 field $40;
text=tranwrd(_infile_,"<text>",'!'); ** Make a single-character delimiter **;
do w=1 by 1 while (scan(text,w,'!') ^= ' ');
field=scan(text,w,'!'); ** Use the delimiter with a SCAN function **;
output;
end;
run;
data test;
length x $ 10;
infile cards dlm='2c'x;
input @;
_infile_=prxchange('s/\<test\>/,/',-1,_infile_); *alter the input buffer to change dlmstr to dlm;
input x @@;
if ^missing(x) then output;
cards;
111<test>22222 3333<test>444<test>
;
run;
111
22222 3333
444
Hi Fried,
Thanks for your answer, but looks like your code also works correct only if file line less than 32767 chars, I actually tried your code and it works ok on small files, but my file is more then 32767 and it already contains '2c'x delimiters inside, so I just use another delimiter that doesn't exists in file, but anyway - on files bigger then 32767 looks like it doesn't work.
Thanks!
Hi Mkeints,
Thanks, your code works ok, but my file is more less big ~200 kb and all data in file is in one line(one row), so code that you proposed(a little bit changed):
filename _infile_ "&Path\data.txt";
data readFromFile;
infile _infile_ lrecl=32767;
input;
length text $32767 field $32767;
text=tranwrd(_infile_,'<test>,'~');
do w=1 by 1 while(scan(text,w,'~')^='');
field=scan(text,w,'~');
lenf=length(field);
output;
end;
run;
So this code higher can read just first 32767 symbols, if you will put 100000,for example, code throw error.
I can read all file by this code:
data test2;
infile _infile_ dsd lrecl=1000000 pad;
input txt1 : $32767. @@;
row=_n_;
run;
And result table "test2" will have many rows, depends on file size and special symbols in data etc.,
and then I can just work(scan,substr,merge strings etc.) with these "test2" table to achive needed result, but I'm not sure if it optimal solution in my case.
May be there is some option that allow to use sas functions that works with strings that are longer then 32767?
Thanks!
Here is a way.
data x;
infile 'c:\x.txt' recfm=n;
input x $char1. @@;
run;
data temp;
set x ;
if cat(lag6(x),lag5(x),lag4(x),lag3(x),lag2(x),lag1(x))='<test>' then group+1;
run;
proc transpose data=temp out=want(keep=col:) ;
by group;
var x;
run;
data want(keep=want);
set want;
want=tranwrd(cat(of col:),'<test>',' ');
run;
Ksharp
消息编辑者为:xia keshan
Hi Ksharp,
I caught the idea, I didn't try all your just part of it( till transpose) plus some simple char concatenations so in the end I achive needed goal.
So thanks!
Here's a technique (untested) that might simplify the programming. It's meant to work as long as none of your fields contains a '<' character. The trick here is using the "@ 'est>" pointer control in the INPUT statement.
I've modified this note to account for the fact that the first field in each line is not preceded by '<test>'.
data ;
infile ..... dlm='<' lrecl=1000000 length=len column=col;
/* COL above is the column pointer after the most recent INPUT statement */
length field $200;
input field @;
do while (col<len);
output;
input @ 'est>' field @;
end;
output;
run;
If the infile is a single long line, then you can simplify to
data ;
infile ..... dlm='<' lrecl=1000000 ;
length field $200;
if _n_=1 then input field @@;
else input @ 'est>' field @@;
run;
The first example uses a trailing single "@", telling SAS to release the current input line when the end of the DATA step is encountered (thereby removing the "lost card" message of an earlier version using double "@@"). The second example uses a trailing double "@@" telling sas NOT to drop the input line.
Gave this a bit more thought. I'm not incredibly pleased with the following, but it appears to get the job done. I testing with a file of several MB of data all on a single line.
Process flow:
1) Read in a binary stream from the file 'in' 256 bytes at a time.
2) search in a loop for delimited strings and substring them out until reaching the end of stream.
3) concatenate remainder from previous try that did not end in a dlmstr and repeat.
data test;
length infile buffer $ 512;
if _n_=1 then do;
dlmstr='<test>';
_prx=prxparse( '/(' || dlmstr || ')|(.)/' );
retain _prx dlmstr;
call missing(buffer);
end;
else if n>0 then buffer=substr(infile,length(infile)+1-n);
infile in recfm=n lrecl=256;
input infile $256.;
infile=strip(buffer) || infile;
start=1;
stop=length(infile);
n=0;
retain n infile;
call prxnext(_prx,start,stop,infile,pos,len);
do while(pos > 0);
if len=length(dlmstr) then do;
x=substr(infile,pos-n,n);
n=0;
output;
end;
else n++1;
call prxnext(_prx,start,stop,infile,pos,len);
end;
keep x;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.