Hi folks, thanks for all the great suggestions. I realized that I simplied the problem too much in my original post and I still can’t figure out with your suggestions. So here is problem:
First of all, there is no delimiters; everything based on positions. One row is for one customer, and it can contains several types of items. Every row starts with Customer ID which take the first 5 characters. There are 4 types of items with similar patterns but different length. Each item start with 4 characters (a combination of letter and number)
One type starts with "AN01" and has length of 10; one type start with "BC01" and has length of 8.
One type starts with "CR4R" and has length of 8; one type start with "DY03" and has length of 6.
Each row can have any number of any types. For example 2 “AN01”, followed by 3 “BC01”, then 10 “CR4R”, then 2 “DY03”. Another row can have 3 “AN01”, no “BC01”, then 8 “CR4R”, then 1 “DY03”.
The tricky part is that every type can have space in the value. E.g. ‘BC01 89’
For example:
CUST12345AN01 98BC01 98BC01 89CR4R 4 5DY03 9
CUST12346AN01 98BC01 98BC01 89 BC017 89CR4R 4 5DY03 9
The first line means: 'CUST12345' is the customer ID, with 5 items
AN01 98
BC01 98
BC01 89
CR4R 4 5
DY03 9
I would like the output to have one item per row along with its Customer ID:
CUST_ID ITEM
CUST12345 AN01 98
CUST12345 BC01 98
CUST12345 BC01 89
CUST12345 CR4R 4 5
CUST12345 DY03 9
CUST12346 AN01 98
CUST12346 BC01 98
CUST12346 BC01 89
CUST12346 BC017 89
CUST12346 CR4R 4 5
CUST12346 DY03 9
Original post:
Hi folks, I had another complicated txt files that I need to read in based on their position. There is no delimiter. One row is one customer, and there are two types of items. Customer ID always take the first 5 characters. One type of item starts with "A" and has length of 10; the other type start with "B" and has length of 15. Each customer can have any number of A or B items. The tricky part is that A or B can have space in the value.
For example:
CUST12345A10 98B98 89 1245A20 98B88 89 1245
This line means: 'CUST12345' is the customer ID
'A10 98' is the value for the first "A" item
'B98 89 1245' is the value for the first "B" item
'A20 98' is the value for the 2nd "A" item
'B88 89 1245' is the value for the 2nd "B" item
It will look like:
CUST12345 A10 98
CUST12345 A20 98
The space in “A” and “B” values drive me crazy. Any idea how to create the output?
... View more