Hi All, I have a txt file that has three types of informaiton: name, address, and transations. There is one row per customer. The data has position defined for name and address, but not for transation. Transation has several types and each type has its own length. For example TRAN has length of 9, while SMET has length of 8. And each row can have any combination of TRAN and SMET. Is there a way that I only read in transtions start with "TRAN" and keep each TRAN as a seperate row?
raw data look like:
AMY 205ABB street NY TRAN90890 TRAN89789 SMET7485 SMET7485
TED 205BBC street CA SMET7885 SMET9885 TRAN23890 TRAN89129 SMET7655
I want the sas data look like:
AMY 205ABB street NY TRAN90890
AMY 205ABB street NY TRAN89789
TED 205BBC street CA TRAN23890
TED 205BBC street CA TRAN89129
Sort of. What you will end up doing is reading in all of the possibilities, and only outputting the correct ones. This example assumes you will replace the columns for the first two variables with the correct columns:
data want;
infile rawdata truncover;
length TRAN $ 9;
input name $ 1-5 address $ 6-25 @ ;
do until (tran=' ');
input tran $ @ ;
if tran =: 'TRAN' then output;
end;
run;
Here is one way you can do it:
data want; infile 'c:\art\textfile.txt' lrecl=200 truncover; informat name $20.; informat address $50.; informat tran $9.; input name address & @; do until (missing(tran)); input @'TRAN' tran @; if not missing(tran) then do; tran=catt('TRAN',tran); output; end; end; input; run;
Art, CEO, AnalystFinder.com
data have;
infile datalines truncover;
input name $ address & $20. @;
i=1;
do until(scan(_infile_ ,i)=' ');
input dummy :$10. @;
if dummy=:'TRAN' then output;
i+1;
end;
drop i;
datalines;
AMY 205ABB street NY TRAN90890 TRAN89789 SMET7485 SMET7485
TED 205BBC street CA SMET7885 SMET9885 TRAN23890 TRAN89129 SMET7655
;
Regards,
Naveen Srinivasan
Your raw data would work better if it was delimited (like comma, tab, pipe '|', etc). Otherwise it will be hard to find the end of the address. Let's first look at a solution that assumes it is easy to read the values. For example if the values where tab delimited.
data want ;
infile 'myfile.txt' dsd dlm='09'x truncover ;
length name $20 address $100 transaction $10 ;
input name address transaction @ ;
do col=1 by 1 until(transaction=' ');
if col=1 or transaction ne ' ' then output;
input transaction @;
end;
run;
Another possibility is that the values are space delimited, but there are always two spaces after the address and the addresses do not contain embedded double spaces. Then you can use the & modifier.
infile 'myfile.txt' truncover ;
length name $20 address $100 transaction $10 ;
input name address & transaction @ ;
If you don't have either of these then perhaps you can take advantage of the fact that your transactions always start with specific characters. Probably better to use regular expression to search for the start of the transactions, but perhaps this would also work?
data want ;
infile cards truncover ;
length name $20 address $100 transaction $10 ;
input @ ;
name = scan(_infile_,1);
loc1 = index(_infile_,'TRAN');
loc2 = index(_infile_,'SMET');
if loc1 and loc2 then loc=min(loc1,loc2);
else loc=max(loc1,loc2);
drop loc1 loc2 loc ;
address = substr(substrn(_infile_,1,loc-1),length(name)+2);
input @loc transaction @;
do col=1 by 1 until(transaction=' ');
if col=1 or transaction ne ' ' then output;
input transaction @;
end;
cards;
AMY 205ABB street NY TRAN90890 TRAN89789 SMET7485 SMET7485
TED 205BBC street CA SMET7885 SMET9885 TRAN23890 TRAN89129 SMET7655
;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.