DATA Step, Macro, Functions and more

Data creation from raw text

New Contributor
Posts: 3

Data creation from raw text

Hi Folks,

Have attached 2 files, the txt file is a flat file which contains data in multiple rows. The good thing is that it is fixed width kind of a data. Have used input to gather information, but getting it beyond one row is turning out to be difficult.

The way in which i need is in the attached excel. Will greatly appreciate any help.



Respected Advisor
Posts: 3,887

Re: Data creation from raw text

You are trying to read data created by a report. It would be much easier/better to try and get access to the data used to generate this report. It is possible to read such data but it can take up quite a bit of time to get it right and cover every single layout variation.

Below code illustrates how it could be done. It also shows for the second last transaction (ACCOUNT VERIFY) how the report layout can vary creating additional cases to be dealt with - so additional checks and input statements.

In short: It could become quite labor intensive to get this 100% right for your real data and it will only work for the report "at hand". If you then want to re-run your code for a later report there might be another "layout" in it for which you haven't coded yet.

data sample(drop=_exp);


    account_nr length=$19

    marker length=$2

    transaction_dt length=8 format=date9.

    text length=$200


  retain account_nr;

  infile 'c:\test\sampledata.txt' truncover dlm=' ';

  input @23 transaction_dt ?? :ddmmyy10. @;

  if not missing(transaction_dt) then


      if anydigit(_infile_)=2 then input account_nr $ 2-20 @;

      input @22 marker @32 x1 :best32. x2 :comma32.;

      input @;

      _exp=find(_infile_,'*EXP DATE INVALID OR EXPIRED','i');

      if _exp>0 then input @_exp p_col $40.;

      input text $200.;




Super User
Posts: 9,662

Re: Data creation from raw text

I gotta say data is very dirty .

filename x 'c:\temp\sampledata.txt';

data want;

infile x dlm=' ' expandtabs truncover;

input @; 

length a1-a16 $ 200;

retain a1-a16;

if prxmatch('/^\s+\d{4}\-\d{4}\-\d{4}\-\d{4}/',_infile_) then do;

   input (a1-a5) ($) a6 & $ (a8-a14) ($) ;

   a7=scan(a6,-1,' ');

   a6=substr(a6,1,findc(strip(a6),' ','b'));


  else if prxmatch('/\d\d\/\d\d\/\d\d/',_infile_) then do;

   input (a2-a5) ($) a6 & $ (a8-a14) ($) ;

   if anyalpha(a5) then return;

   a7=scan(a6,-1,' ');

   a6=substr(a6,1,findc(strip(a6),' ','b'));


  else if  left(_infile_) eq: '*' then a16=_infile_;

  else if upcase(left(_infile_)) eq: 'PAYMENT' then do;

   if anydigit(a13)=1 then do;a14=a13;a13=' ';end;


   if not anyalpha(a5) then output; call missing(a16);



Xia Keshan

Ask a Question
Discussion stats
  • 2 replies
  • 3 in conversation