Solved: Re: fedwire funds format files with tags and elements

jimbobob · Posted 08-17-2021 05:07 PM

Stuck trying to figure out how to parse a file that we get from fedwire the general format is in attached myfile.txt There can be any number of records. Within the records the data elements are separated by a tag enclosed in {} which represent a field name, the data following the tag is associated with that field, up until we reach next tag {}. Each record can have different tags and different number of tags, as in the attached example all three records have the same tags {1100}, {1110},{1120},{1510},{1520},{2000},{3100},{3320},{3400},and {3600}. However you can see record three then jumps to {4200}, whereas records 1and 2 the next tag is {3620}.

I have tried using input @, but struggling with that, stopping after field2, also some of the data within the tag can be variable length, so can't always set the field value to extract exact length of the data, so trying to figure out how to parse by tags {} and capture data.

options ls=max;
filename fedwire "%sysfunc(pathname(WORK))\fedwire.txt";
data have;
  input;
  file fedwire;
  put _infile_;
  datalines;
{1100}30P N{1110}08030815FT03{1120}20210803L1B7832F00013308030815FT03{1510}1000{1520}20210801B1Q8161C001707{2000}002020230000{3100}021000018BANKOBOB*{3320}F9S2108035631900*{3400}121125631AAA BBB CCCC DD EE*{3600}CTP{3620}3*FA6FE938-A6F6-4EE9-A062-6B87FC752339*{3700}SUSD0,00*{3710}USD203000,*{4200}D1602204277*ALOT OF WRITTING STUFF 98203*{5000}D02628651*MR WARREN JOHN BLAHALBH*FLAT 30 135 AAAAAAA WALK*LONDON N1 7RR GB*{5100}D8033779245*COOTTS AND COMP'L. KING DIV*73 MAN STR, EAST HSE, 3RD FL*D*{6000}HELEN BOBS STUFF*ROUTE: 122228444*
{1100}30P N{1110}08031116FT03{1120}20210803L1B7832F00031008031116FT03{1510}1000{1520}20210803B1QGC06C006237{2000}000300899500{3100}021000021NONEOFBUS*{3320}98566273215FS*{3400}121222484AAA BBB CCCC DD EE*{3600}CTP{3620}3*9A294483-5377-4A00-9362-8C95FEF4229F*{3700}SUSD5,00*{3710}USD6000,00*{4100}F124448405*MOREBLAH BLAH BLAH STUFF*{4200}D88040660033579*EDUARDO BLAH BLAH HERNANDEZ, E*STADOS UNIDOS,11111 LILAC ,TEXAS*{4320}SWF OF 21/09/03*{5000}DCLIENTE:13770683*INDUSTY  S A DE C V*BRALES C P*45694 EL SALTO JALISCO MEXICO*{5100}BRGIOMFE33XMT*
{1100}30P S{1110}08031811FT03{1120}20210803L1B7832F00101008031811FT03{1510}1631{1520}20210803QMGAT015001875{2000}000440962083{3100}125108405CANTSEE ME*{3320}F001BB*{3400}121045584AAA BBB CCCC DD EE*{3600}DRB{4200}F125108405*SOMEWHER OVER THE RAINBOW*{4400}D125108405*NOT IN KANSAS*{5000}F125108405*LOST SOMEWHERE*{5400}125108405{6500}DRAW*CREDIT IMMUNITY PALACE*
run;
data want2;
infile fedwire  truncover;
input @"{1100}" field1 :$5. @"{1110}" field2 :$12. @"{1120}" field3 :$34.;
run;

Was trying to use regular expression but can't figure that out either, I think this will work, but not sure how to keep field name prxparse('/(?<={\d\d\d\d})\w+/')

Any help would be appreciated. Thanks

Tom · Posted 08-18-2021 01:50 PM

Seems pretty simple to read.

data mydata;
  infile 'c:\downloads\myfile.txt' dlm='{}' truncover ;
  length row col 8 tag $20 text $200 ;
  row+1;
  do col=1 by 1 until(tag=' ');
    input tag text @;
    if tag ne ' ' then output;
  end;
run;

Results:

Obs    row    col    tag     text

  1     1       1    1100    30P N
  2     1       2    1110    08030815FT03
  3     1       3    1120    20210803L1B7832F00013308030815FT03
  4     1       4    1510    1000
  5     1       5    1520    20210801B1Q8161C001707
  6     1       6    2000    002020230000
  7     1       7    3100    021000018BANKOBOB*
  8     1       8    3320    F9S2108035631900*
  9     1       9    3400    121125631AAA BBB CCCC DD EE*
 10     1      10    3600    CTP
 11     1      11    3620    3*FA6FE938-A6F6-4EE9-A062-6B87FC752339*
 12     1      12    3700    SUSD0,00*
 13     1      13    3710    USD203000,*
 14     1      14    4200    D1602204277*ALOT OF WRITTING STUFF 98203*
 15     1      15    5000    D02628651*MR WARREN JOHN BLAHALBH*FLAT 30 135 AAAAAAA WALK*LONDON N1 7RR GB*
 16     1      16    5100    D8033779245*COOTTS AND COMP'L. KING DIV*73 MAN STR, EAST HSE, 3RD FL*D*
 17     1      17    6000    HELEN BOBS STUFF*ROUTE: 122228444*
 18     2       1    1100    30P N
 19     2       2    1110    08031116FT03
 20     2       3    1120    20210803L1B7832F00031008031116FT03
...

View solution in original post

ballardw · Posted 08-17-2021 06:05 PM

The documentation for the file contents is where exactly?

My guess is some of those "fields" have a mixture of date and other stuff.

Is there anything that identifies a line, or combination of lines, as a single record?

If the fields can have different lengths it really helps to know which ones what rules may be involved for the substrings.

jimbobob · Posted 08-17-2021 06:22 PM

If you really want to read the documentation, here you go: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7645&rep=rep1&type=pdf

Yes the data in between the tags can be a mixture, for my purposes if I can get each set of data elements between the tags into its own column of data, then i can parse that data further.

Each row is one record that can be parsed out to many fields. Data between each tag {} can be one or more fields of data as well, but im working on baby steps just trying to break out each set of data between the tags to its own column.

ChrisNZ · Posted 08-17-2021 06:50 PM

Like this?

data WANT;
  infile FEDWIRE truncover lrecl=8000;
  input ;
  array FIELD[0:99] $10.;
  array VALUE[0:99] $50.;
  RECORD=_N_;
  do SCAN=0 to 99 ;
    FIELD[SCAN]=scan(_INFILE_,SCAN*2+1,'{}');
    VALUE[SCAN]=scan(_INFILE_,SCAN*2+2,'{}');
    if VALUE[SCAN]=' ' then leave; 
  end;
  output;
 run;

High-Performance SAS Coding - Third Edition

jimbobob · Posted 08-17-2021 07:18 PM

Thanks Chris, I might be able to work with this. Attached is my desired output

ChrisNZ · Posted 08-17-2021 07:59 PM

Glad to know. I'm not touching MS Office files 🙂

High-Performance SAS Coding - Third Edition

jimbobob · Posted 08-17-2021 08:10 PM

1100	1110	1120	1510	1520	2000	3100	3320	3400	3600	3620	3700	3710	4100	4200	4320	4400	5000	5100	5400	6000	6500
30P N	08030815FT03	20210803L1B7832F00013308030815FT03	1000	20210801B1Q8161C001707	002020230000	021000018BANKOBOB*	F9S2108035631900*	121125631AAA BBB CCCC DD EE*	CTP	3FA6FE938-A6F6-4EE9-A062-6B87FC752339	SUSD0,00*	USD203000,*		D1602204277ALOT OF WRITTING STUFF 98203			D02628651MR WARREN JOHN BLAHALBHFLAT 30 135 AAAA	D8033779245COOTTS AND COMP'L. KING DIV73 MAN STR		HELEN BOBS STUFFROUTE: 122228444
30P N	08031116FT03	20210803L1B7832F00031008031116FT03	1000	20210803B1QGC06C006237	000300899500	021000021NONEOFBUS*	98566273215FS*	121222484AAA BBB CCCC DD EE*	CTP	39A294483-5377-4A00-9362-8C95FEF4229F	SUSD5,00*	USD6000,00*	F124448405MOREBLAH BLAH BLAH STUFF	D88040660033579EDUARDO BLAH BLAH HERNANDEZ, ESTA	SWF OF 21/09/03*		DCLIENTE:13770683INDUSTY S A DE C VBRALES C P*4	BRGIOMFE33XMT*
30P S	08031811FT03	20210803L1B7832F00101008031811FT03	1631	20210803QMGAT015001875	000440962083	125108405CANTSEE ME*	F001BB*	121045584AAA BBB CCCC DD EE*	DRB					F125108405SOMEWHER OVER THE RAINBOW		D125108405NOT IN KANSAS	F125108405LOST SOMEWHERE		125108405		DRAWCREDIT IMMUNITY PALACE

ChrisNZ · Posted 08-17-2021 09:37 PM

1. How do you establish the list of variable names?

2. The names you show are invalid: the first character is a digit.

High-Performance SAS Coding - Third Edition

jimbobob · Posted 08-18-2021 11:34 AM

The goal is to rename the numbers to field name, for example 1100 = Message_Disposition, 1110 = Receipt_Time_Stamp, this pdf list out the actuals names that I would rename the variables to https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7645&rep=rep1&type=pdf

ballardw · Posted 08-18-2021 06:21 PM

@jimbobob wrote:

The goal is to rename the numbers to field name, for example 1100 = Message_Disposition, 1110 = Receipt_Time_Stamp, this pdf list out the actuals names that I would rename the variables to https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.445.7645&rep=rep1&type=pdf

You might consider a custom format to supply the text "Message disposition" or similar for the value of the Tag, using @Tom's code.

Or create a format so that the formatted value of the Tag variable could be used by Proc Transpose in an ID statement if you had a variable that would indicate which observations were from the same line in the original file.

Example using a format to create new variables using with Proc Transpose:

data example;
   input x $ y $ z $;
datalines;
1 2 abc
4 5 pdq
;

proc format;
value $ycode
'2'='Some_description'
'4'='Other_desc'
'5'='This_description'
;
run;

proc transpose data=example
   out=trans;
by x;
id y;
var z;
format y $ycode.;
run;

Placing a significant chunk of _infile_ into a permanent variable might work for a line identifier if nothing better comes to mind using Tom's code.

Tom · Posted 08-18-2021 11:11 PM

A unique row identifier is what the ROW variable is for in the selected answer.

jimbobob · Posted 08-19-2021 11:21 AM

Excellent, thanks Ballard, way better than the route I was taking using a replace macro I found in a SAS paper was working but taking forever to run. this was instant.

Tom · Posted 08-18-2021 01:50 PM

Seems pretty simple to read.

data mydata;
  infile 'c:\downloads\myfile.txt' dlm='{}' truncover ;
  length row col 8 tag $20 text $200 ;
  row+1;
  do col=1 by 1 until(tag=' ');
    input tag text @;
    if tag ne ' ' then output;
  end;
run;

Results:

Obs    row    col    tag     text

  1     1       1    1100    30P N
  2     1       2    1110    08030815FT03
  3     1       3    1120    20210803L1B7832F00013308030815FT03
  4     1       4    1510    1000
  5     1       5    1520    20210801B1Q8161C001707
  6     1       6    2000    002020230000
  7     1       7    3100    021000018BANKOBOB*
  8     1       8    3320    F9S2108035631900*
  9     1       9    3400    121125631AAA BBB CCCC DD EE*
 10     1      10    3600    CTP
 11     1      11    3620    3*FA6FE938-A6F6-4EE9-A062-6B87FC752339*
 12     1      12    3700    SUSD0,00*
 13     1      13    3710    USD203000,*
 14     1      14    4200    D1602204277*ALOT OF WRITTING STUFF 98203*
 15     1      15    5000    D02628651*MR WARREN JOHN BLAHALBH*FLAT 30 135 AAAAAAA WALK*LONDON N1 7RR GB*
 16     1      16    5100    D8033779245*COOTTS AND COMP'L. KING DIV*73 MAN STR, EAST HSE, 3RD FL*D*
 17     1      17    6000    HELEN BOBS STUFF*ROUTE: 122228444*
 18     2       1    1100    30P N
 19     2       2    1110    08031116FT03
 20     2       3    1120    20210803L1B7832F00031008031116FT03
...

jimbobob · Posted 08-18-2021 02:02 PM

Awesome. I love SAS folks. Thank you so much Tom. Chris thanks as well for your help,

jimbobob · Posted 08-19-2021 04:26 PM

Hey Tom, ran into a snag with some of my files. I found that in some file the records start with "YFT811 " before starting the rest of the delimited file, so the delimted file really starts at position 9: I found a paper on using a variable for the dlm= option, but I've had no luck figuring out how to do it

Paper I was trying to follow: https://www.lexjansen.com/nesug/nesug12/cc/cc23.pdf

My Attempted Code:

data x;
  infile "C:\Users\jwalker\Desktop\New folder (2)\FedLine32948_PCBB_FXWIRE_FEDLINE_202107201050.txt" dlm=mydlm truncover;
  length row col 8 tag $32 text $1000;
  row+1;
  mydlm = 'YFT811';
  input v1 $ @;
  do col=1 by 1 until(tag=' ');
  	 mydlm = '{}';
    input tag text @;
    if tag ne ' ' then output;
  end;
run;

I there a way to do this or tell the input to start at position 9?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away