DATA Step, Macro, Functions and more

How to read in txt file and only read in and keep certain columns with some key words

Reply
New Contributor
Posts: 2

How to read in txt file and only read in and keep certain columns with some key words

[ Edited ]

Hi All, I have a txt file that has three types of informaiton: name, address, and transations. There is one row per customer. The data has position defined for name and address, but not for transation. Transation has several  types and each type has its own length.  For example TRAN has length of 9, while SMET has length of 8. And each row can have any combination of TRAN and SMET.  Is there a way that I only read in transtions start with "TRAN" and keep each TRAN as a seperate row? 

 

raw data look like:

AMY   205ABB street NY   TRAN90890   TRAN89789   SMET7485   SMET7485
TED   205BBC street CA   SMET7885   SMET9885   TRAN23890   TRAN89129   SMET7655

 

I want the sas data look like:

 

AMY   205ABB street NY   TRAN90890 

AMY   205ABB street NY   TRAN89789  

TED   205BBC street CA   TRAN23890  

TED   205BBC street CA   TRAN89129

 

Super User
Posts: 5,503

Re: How to read in txt file and only read in and keep certain columns with some key words

Posted in reply to yymissing

Sort of.  What you will end up doing is reading in all of the possibilities, and only outputting the correct ones.  This example assumes you will replace the columns for the first two variables with the correct columns:

 

data want;

infile rawdata truncover;

length TRAN $ 9;

input name $ 1-5  address $ 6-25 @ ;

do until (tran=' ');

   input tran $ @ ;

   if tran =: 'TRAN' then output;

end;

run;

PROC Star
Posts: 7,471

Re: How to read in txt file and only read in and keep certain columns with some key words

Posted in reply to yymissing

Here is one way you can do it:

 

data want;
  infile 'c:\art\textfile.txt' lrecl=200 truncover;
  informat name $20.;
  informat address $50.;
  informat tran $9.;
  input name address & @;
  do until (missing(tran));
    input @'TRAN' tran @;
    if not missing(tran) then do;
      tran=catt('TRAN',tran);
      output;
    end;
  end;
  input;
run;

Art, CEO, AnalystFinder.com

 

PROC Star
Posts: 283

Re: How to read in txt file and only read in and keep certain columns with some key words

[ Edited ]
Posted in reply to yymissing

 

data have;

infile datalines truncover;

input  name $ address & $20. @;

i=1;

do until(scan(_infile_ ,i)=' ');

input dummy :$10.  @;

if dummy=:'TRAN' then output;

i+1;

end;

drop i;

datalines;

AMY   205ABB street NY   TRAN90890   TRAN89789   SMET7485   SMET7485

TED   205BBC street CA   SMET7885   SMET9885   TRAN23890   TRAN89129   SMET7655

;

 

Regards,

Naveen Srinivasan

 

Super User
Super User
Posts: 7,042

Re: How to read in txt file and only read in and keep certain columns with some key words

[ Edited ]
Posted in reply to yymissing

Your raw data would work better if it was delimited (like comma, tab, pipe '|', etc).  Otherwise it will be hard to find the end of the address.  Let's first look at a solution that assumes it is easy to read the values.  For example if the values where tab delimited.

data want ;
  infile 'myfile.txt' dsd dlm='09'x truncover ;
  length name $20 address $100 transaction $10 ;
  input name address transaction @ ;
  do col=1 by 1 until(transaction=' ');
     if col=1 or transaction ne ' ' then output;
     input transaction @;
  end;
run;

Another possibility is that the values are space delimited, but there are always two spaces after the address and the addresses do not contain embedded double spaces. Then you can use the & modifier.

  infile 'myfile.txt' truncover ;
  length name $20 address $100 transaction $10 ;
  input name address & transaction @ ;

If you don't have either of these then perhaps you can take advantage of the fact that your transactions always start with specific characters.  Probably better to use regular expression to search for the start of the transactions, but perhaps this would also work?

 

data want ;
  infile cards truncover ;
  length name $20 address $100 transaction $10 ;
  input @ ;
  name = scan(_infile_,1);
  loc1 = index(_infile_,'TRAN');
  loc2 = index(_infile_,'SMET');
  if loc1 and loc2 then loc=min(loc1,loc2);
  else loc=max(loc1,loc2);
  drop loc1 loc2 loc ;
  address = substr(substrn(_infile_,1,loc-1),length(name)+2);
  input @loc transaction @;
  do col=1 by 1 until(transaction=' ');
     if col=1 or transaction ne ' ' then output;
     input transaction @;
  end;
cards;
AMY   205ABB street NY   TRAN90890   TRAN89789   SMET7485   SMET7485
TED   205BBC street CA   SMET7885   SMET9885   TRAN23890   TRAN89129   SMET7655
;

 

Ask a Question
Discussion stats
  • 4 replies
  • 124 views
  • 0 likes
  • 5 in conversation