Stuck trying to figure out how to parse a file that we get from fedwire the general format is in attached myfile.txt There can be any number of records. Within the records the data elements are separated by a tag enclosed in {} which represent a field name, the data following the tag is associated with that field, up until we reach next tag {}. Each record can have different tags and different number of tags, as in the attached example all three records have the same tags {1100}, {1110},{1120},{1510},{1520},{2000},{3100},{3320},{3400},and {3600}. However you can see record three then jumps to {4200}, whereas records 1and 2 the next tag is {3620}.
I have tried using input @, but struggling with that, stopping after field2, also some of the data within the tag can be variable length, so can't always set the field value to extract exact length of the data, so trying to figure out how to parse by tags {} and capture data.
options ls=max;
filename fedwire "%sysfunc(pathname(WORK))\fedwire.txt";
data have;
input;
file fedwire;
put _infile_;
datalines;
{1100}30P N{1110}08030815FT03{1120}20210803L1B7832F00013308030815FT03{1510}1000{1520}20210801B1Q8161C001707{2000}002020230000{3100}021000018BANKOBOB*{3320}F9S2108035631900*{3400}121125631AAA BBB CCCC DD EE*{3600}CTP{3620}3*FA6FE938-A6F6-4EE9-A062-6B87FC752339*{3700}SUSD0,00*{3710}USD203000,*{4200}D1602204277*ALOT OF WRITTING STUFF 98203*{5000}D02628651*MR WARREN JOHN BLAHALBH*FLAT 30 135 AAAAAAA WALK*LONDON N1 7RR GB*{5100}D8033779245*COOTTS AND COMP'L. KING DIV*73 MAN STR, EAST HSE, 3RD FL*D*{6000}HELEN BOBS STUFF*ROUTE: 122228444*
{1100}30P N{1110}08031116FT03{1120}20210803L1B7832F00031008031116FT03{1510}1000{1520}20210803B1QGC06C006237{2000}000300899500{3100}021000021NONEOFBUS*{3320}98566273215FS*{3400}121222484AAA BBB CCCC DD EE*{3600}CTP{3620}3*9A294483-5377-4A00-9362-8C95FEF4229F*{3700}SUSD5,00*{3710}USD6000,00*{4100}F124448405*MOREBLAH BLAH BLAH STUFF*{4200}D88040660033579*EDUARDO BLAH BLAH HERNANDEZ, E*STADOS UNIDOS,11111 LILAC ,TEXAS*{4320}SWF OF 21/09/03*{5000}DCLIENTE:13770683*INDUSTY S A DE C V*BRALES C P*45694 EL SALTO JALISCO MEXICO*{5100}BRGIOMFE33XMT*
{1100}30P S{1110}08031811FT03{1120}20210803L1B7832F00101008031811FT03{1510}1631{1520}20210803QMGAT015001875{2000}000440962083{3100}125108405CANTSEE ME*{3320}F001BB*{3400}121045584AAA BBB CCCC DD EE*{3600}DRB{4200}F125108405*SOMEWHER OVER THE RAINBOW*{4400}D125108405*NOT IN KANSAS*{5000}F125108405*LOST SOMEWHERE*{5400}125108405{6500}DRAW*CREDIT IMMUNITY PALACE*
run;
data want2;
infile fedwire truncover;
input @"{1100}" field1 :$5. @"{1110}" field2 :$12. @"{1120}" field3 :$34.;
run;
Was trying to use regular expression but can't figure that out either, I think this will work, but not sure how to keep field name prxparse('/(?<={\d\d\d\d})\w+/')
Any help would be appreciated. Thanks
Seems pretty simple to read.
data mydata;
infile 'c:\downloads\myfile.txt' dlm='{}' truncover ;
length row col 8 tag $20 text $200 ;
row+1;
do col=1 by 1 until(tag=' ');
input tag text @;
if tag ne ' ' then output;
end;
run;
Results:
Obs row col tag text 1 1 1 1100 30P N 2 1 2 1110 08030815FT03 3 1 3 1120 20210803L1B7832F00013308030815FT03 4 1 4 1510 1000 5 1 5 1520 20210801B1Q8161C001707 6 1 6 2000 002020230000 7 1 7 3100 021000018BANKOBOB* 8 1 8 3320 F9S2108035631900* 9 1 9 3400 121125631AAA BBB CCCC DD EE* 10 1 10 3600 CTP 11 1 11 3620 3*FA6FE938-A6F6-4EE9-A062-6B87FC752339* 12 1 12 3700 SUSD0,00* 13 1 13 3710 USD203000,* 14 1 14 4200 D1602204277*ALOT OF WRITTING STUFF 98203* 15 1 15 5000 D02628651*MR WARREN JOHN BLAHALBH*FLAT 30 135 AAAAAAA WALK*LONDON N1 7RR GB* 16 1 16 5100 D8033779245*COOTTS AND COMP'L. KING DIV*73 MAN STR, EAST HSE, 3RD FL*D* 17 1 17 6000 HELEN BOBS STUFF*ROUTE: 122228444* 18 2 1 1100 30P N 19 2 2 1110 08031116FT03 20 2 3 1120 20210803L1B7832F00031008031116FT03 ...
The documentation for the file contents is where exactly?
My guess is some of those "fields" have a mixture of date and other stuff.
Is there anything that identifies a line, or combination of lines, as a single record?
If the fields can have different lengths it really helps to know which ones what rules may be involved for the substrings.
If you really want to read the documentation, here you go: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7645&rep=rep1&type=pdf
Yes the data in between the tags can be a mixture, for my purposes if I can get each set of data elements between the tags into its own column of data, then i can parse that data further.
Each row is one record that can be parsed out to many fields. Data between each tag {} can be one or more fields of data as well, but im working on baby steps just trying to break out each set of data between the tags to its own column.
Like this?
data WANT;
infile FEDWIRE truncover lrecl=8000;
input ;
array FIELD[0:99] $10.;
array VALUE[0:99] $50.;
RECORD=_N_;
do SCAN=0 to 99 ;
FIELD[SCAN]=scan(_INFILE_,SCAN*2+1,'{}');
VALUE[SCAN]=scan(_INFILE_,SCAN*2+2,'{}');
if VALUE[SCAN]=' ' then leave;
end;
output;
run;
Thanks Chris, I might be able to work with this. Attached is my desired output
Glad to know. I'm not touching MS Office files 🙂
1100 | 1110 | 1120 | 1510 | 1520 | 2000 | 3100 | 3320 | 3400 | 3600 | 3620 | 3700 | 3710 | 4100 | 4200 | 4320 | 4400 | 5000 | 5100 | 5400 | 6000 | 6500 |
30P N | 08030815FT03 | 20210803L1B7832F00013308030815FT03 | 1000 | 20210801B1Q8161C001707 | 002020230000 | 021000018BANKOBOB* | F9S2108035631900* | 121125631AAA BBB CCCC DD EE* | CTP | 3*FA6FE938-A6F6-4EE9-A062-6B87FC752339* | SUSD0,00* | USD203000,* | D1602204277*ALOT OF WRITTING STUFF 98203* | D02628651*MR WARREN JOHN BLAHALBH*FLAT 30 135 AAAA | D8033779245*COOTTS AND COMP'L. KING DIV*73 MAN STR | HELEN BOBS STUFF*ROUTE: 122228444* | |||||
30P N | 08031116FT03 | 20210803L1B7832F00031008031116FT03 | 1000 | 20210803B1QGC06C006237 | 000300899500 | 021000021NONEOFBUS* | 98566273215FS* | 121222484AAA BBB CCCC DD EE* | CTP | 3*9A294483-5377-4A00-9362-8C95FEF4229F* | SUSD5,00* | USD6000,00* | F124448405*MOREBLAH BLAH BLAH STUFF* | D88040660033579*EDUARDO BLAH BLAH HERNANDEZ, E*STA | SWF OF 21/09/03* | DCLIENTE:13770683*INDUSTY S A DE C V*BRALES C P*4 | BRGIOMFE33XMT* | ||||
30P S | 08031811FT03 | 20210803L1B7832F00101008031811FT03 | 1631 | 20210803QMGAT015001875 | 000440962083 | 125108405CANTSEE ME* | F001BB* | 121045584AAA BBB CCCC DD EE* | DRB | F125108405*SOMEWHER OVER THE RAINBOW* | D125108405*NOT IN KANSAS* | F125108405*LOST SOMEWHERE* | 125108405 | DRAW*CREDIT IMMUNITY PALACE* |
1. How do you establish the list of variable names?
2. The names you show are invalid: the first character is a digit.
The goal is to rename the numbers to field name, for example 1100 = Message_Disposition, 1110 = Receipt_Time_Stamp, this pdf list out the actuals names that I would rename the variables to https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7645&rep=rep1&type=pdf
@jimbobob wrote:
The goal is to rename the numbers to field name, for example 1100 = Message_Disposition, 1110 = Receipt_Time_Stamp, this pdf list out the actuals names that I would rename the variables to https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7645&rep=rep1&type=pdf
You might consider a custom format to supply the text "Message disposition" or similar for the value of the Tag, using @Tom's code.
Or create a format so that the formatted value of the Tag variable could be used by Proc Transpose in an ID statement if you had a variable that would indicate which observations were from the same line in the original file.
Example using a format to create new variables using with Proc Transpose:
data example; input x $ y $ z $; datalines; 1 2 abc 4 5 pdq ; proc format; value $ycode '2'='Some_description' '4'='Other_desc' '5'='This_description' ; run; proc transpose data=example out=trans; by x; id y; var z; format y $ycode.; run;
Placing a significant chunk of _infile_ into a permanent variable might work for a line identifier if nothing better comes to mind using Tom's code.
A unique row identifier is what the ROW variable is for in the selected answer.
Excellent, thanks Ballard, way better than the route I was taking using a replace macro I found in a SAS paper was working but taking forever to run. this was instant.
Seems pretty simple to read.
data mydata;
infile 'c:\downloads\myfile.txt' dlm='{}' truncover ;
length row col 8 tag $20 text $200 ;
row+1;
do col=1 by 1 until(tag=' ');
input tag text @;
if tag ne ' ' then output;
end;
run;
Results:
Obs row col tag text 1 1 1 1100 30P N 2 1 2 1110 08030815FT03 3 1 3 1120 20210803L1B7832F00013308030815FT03 4 1 4 1510 1000 5 1 5 1520 20210801B1Q8161C001707 6 1 6 2000 002020230000 7 1 7 3100 021000018BANKOBOB* 8 1 8 3320 F9S2108035631900* 9 1 9 3400 121125631AAA BBB CCCC DD EE* 10 1 10 3600 CTP 11 1 11 3620 3*FA6FE938-A6F6-4EE9-A062-6B87FC752339* 12 1 12 3700 SUSD0,00* 13 1 13 3710 USD203000,* 14 1 14 4200 D1602204277*ALOT OF WRITTING STUFF 98203* 15 1 15 5000 D02628651*MR WARREN JOHN BLAHALBH*FLAT 30 135 AAAAAAA WALK*LONDON N1 7RR GB* 16 1 16 5100 D8033779245*COOTTS AND COMP'L. KING DIV*73 MAN STR, EAST HSE, 3RD FL*D* 17 1 17 6000 HELEN BOBS STUFF*ROUTE: 122228444* 18 2 1 1100 30P N 19 2 2 1110 08031116FT03 20 2 3 1120 20210803L1B7832F00031008031116FT03 ...
Awesome. I love SAS folks. Thank you so much Tom. Chris thanks as well for your help,
Hey Tom, ran into a snag with some of my files. I found that in some file the records start with "YFT811 " before starting the rest of the delimited file, so the delimted file really starts at position 9: I found a paper on using a variable for the dlm= option, but I've had no luck figuring out how to do it
Paper I was trying to follow: https://www.lexjansen.com/nesug/nesug12/cc/cc23.pdf
My Attempted Code:
data x;
infile "C:\Users\jwalker\Desktop\New folder (2)\FedLine32948_PCBB_FXWIRE_FEDLINE_202107201050.txt" dlm=mydlm truncover;
length row col 8 tag $32 text $1000;
row+1;
mydlm = 'YFT811';
input v1 $ @;
do col=1 by 1 until(tag=' ');
mydlm = '{}';
input tag text @;
if tag ne ' ' then output;
end;
run;
I there a way to do this or tell the input to start at position 9?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.