Solved: Re: Broken Bar or Broken Code when Importing Data

jimbobob · Posted 06-06-2023 06:36 PM

Need help solving a Broken Bar issue when importing data. I have data that imports just fine until it has Broken bars (DLM='a6'x) in the data. In my datastep I'm not using delimiters as its a fixed length import process. When there are none of these Broken Bars the whole highlighted section imports just fine , it should import as one column:

What I get:

	data dsn;
	    length location $1000;
	    set NEW_FILES;
	    location=cats("&path/",FILE_NAME);
	    infile dummy filevar=location end=done;

	    FORMAT
	        ADDENDA_TYPE_CODE $CHAR2.
			PAYMENT_RELATED_INFORMATION $CHAR80.
			ADDENDA_SEQUENCE_NUMBER $CHAR4.
			ENTRY_DETAIL_SEQUENCE_NUMBER $CHAR7.;

	    do while (not done);
			input RECORD_TYPE_CODE $1. @;

			if RECORD_TYPE_CODE = "7" then do;
				input ADD_TYPE_CODE $2. @;
				if ADD_TYPE_CODE = "05" then do;
					INPUT
			        @2     ADDENDA_TYPE_CODE $2.
					@4	   PAYMENT_RELATED_INFORMATION $80.
					@84	   ADDENDA_SEQUENCE_NUMBER $4.
					@88	   ENTRY_DETAIL_SEQUENCE_NUMBER $7.;
				end;
			end;
	        output ;
	    end;
run;

I thought it could be an encoding issue i tried adding encoding="ANSI" to the code but no luck, when looking at the encoding in notepad++ it says its:

To note I'm importing this file into SAS studio 5.2 from a directory mounted onto our unix servers.

any help would be appreciated. thanks

Tom · Posted 06-07-2023 11:20 AM

If you know that is the only goofy non-ascii character you have to deal with you might just use TRANSLATE() to replace them with a normal ASCII character instead.

/* Start of your data step */
* Add ENCODING and TRUNCOVER options to the INFILE statement ;
  infile dummy filevar=location end=done truncover encoding='any';
* Read the line and convert the character ;
  input @;
  _infile_=translate(_infile_,'7C'x,'A6'x);
/*  rest of your data step */

View solution in original post

ballardw · Posted 06-06-2023 07:38 PM

I'm not sure I understand the description. You say ", it should import as one column" but you input statement, coupled with the output, means the data set is supposed to contain 6 variables and you only show 4.

You description does not actually tell me what is wrong in the output, as in what values should appear.

It might help to include some of the source text file as actual text, copy that text instead of the picture open a text box on the forum with the </> icon above the message box and paste the text.

Also it might help to show some source text that does not have any of the issue.

Is the file supposed to have those characters? If not you might ask the source why they are appearing, especially if it is not consistent.

jimbobob · Posted 06-07-2023 10:08 AM

Thanks @ballardw thats is just the section of the code for extracting the NACHA layout part that is known as Addenda, starts with a 7, in most cases its a free form field and in this case the customer used broken bars to sperate components of their additional payment information, and the gray highlighted part of the screenshot I just want in one 80 character field. I cannot upload the actual file as it confidential banking data. Here is a screen shot of another customers addenda that work just fine, the highlighted gets imported properly.

Tom · Posted 06-06-2023 07:43 PM

Why are you trying to read what is clearly a DELIMITED text file as if it was FIXED POSITION file?

jimbobob · Posted 06-07-2023 09:57 AM

Thanks @Tom its a fixed width NACHA layout, that's just a short snippet of the data as it confidential banking data that's the most I can screen shot. That part of the layout is known as the Addenda, and in most cases depending on the adenda type, the customer can free form what ever they want in that section, this customer just happened to add payment related info separated by broken bars.

Tom · Posted 06-06-2023 11:02 PM

Hard to tell from your photograph, but it looks like the lines are shorter than the more than 80 bytes you are trying to read. Try adding the TRUNCOVER option to the INFILE statement.

Also why are you attaching the $CHAR display format to the character variables (so that leading spaces are printed), but reading with that data using the $ informat (so that leading spaces are removed)?

Kurt_Bremser · Posted 06-07-2023 02:48 AM

The seemingly "blank" spaces might be caused by tabs; look if Notepad++ reveals such.

And make sure you don't trip over encoding issues (A6x is part of the ISO-8859/Windows 1252 ASCII table of single-byte characters, but may only be a continuation byte in UTF-8).

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

jimbobob · Posted 06-07-2023 10:47 AM

Thanks @Kurt_Bremser not sure how to do that in notepad++, I selected the option show all characters, but not sure what I should be looking for:

Also when I import this file with my windows desktop EG it imports just fine, is there an option I need to add when importing this on the server?

jimbobob · Posted 06-07-2023 11:03 AM

Uploaded sample data that does not import properly in the Addenda section for the customer that has the broken bars in the data versus the other customer that works just fine. Also uploaded full code

jimbobob · Posted 06-07-2023 11:11 AM

forgot screen shot, the yellow highlighted is working, the other records getting cut off. Also @Tom I did try TRUNCOVER, no change

Tom · Posted 06-07-2023 11:41 AM

So change this line:

      infile dummy filevar=location end=done encoding='any' truncover;

And add these two lines later:

      do while (not done);
      input @;
      _infile_=translate(_infile_,'7C'x,'A6'x);
      input RECORD_TYPE_CODE $1. @;

Tom · Posted 06-07-2023 11:15 AM

A6 is NOT an ASCII character.

19   data _null_;
20    x='a6'x ;
21    put x= x $hex2. ;
22    x='|';
23    put x= x $hex2. ;
24   run;

x=¦ A6
x=| 7C

The hex code for an actual ASCII pipe character is '7C'x.

So if your file has non-ASCII characters in it then you need to be very careful about what ENCODING your SAS session is using.

If your SAS session is using a single byte encoding like WLATIN1 then it should read the A6 characters fine. If you are using some other single byte encoding it will also read fine, but it might display as some other glyph.

But if it is use UTF-8 encoding then it would try to convert those 'A6'x characters into 'C2A6'x and that would mess up the column count on your input statements.

Example: So I made this file using WLATIN1 encoding session:

data _null_;
 file 'c:\downloads\bar.txt';
 put 'ABC' 'a6'x 'XYZ' 'a6'x '123' ;
run;

Then read it with a SAS running using UTF-8: (Notice that copy and paste from the SAS log to this session has caused the non-ascii 2 byte characters to be seen as two separate characters.)

36   data test;
37     infile 'c:\downloads\bar.txt' truncover encoding='wlatin1';
38     input str $30.;
39     length=length(str);
40     klength=klength(str);
41     put str= / str $hex.
42       / length= klength= ;
43     do i=1 to klength;
44       length char $4 ;
45       char=ksubstr(str,i,1);
46       put i= char= char $hex. ;
47     end;
48   run;

NOTE: The infile 'c:\downloads\bar.txt' is:
      Filename=c:\downloads\bar.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=13,
      Last Modified=07Jun2023:11:07:16,
      Create Time=07Jun2023:11:07:16

str=ABCÂ¦XYZÂ¦123
414243C2A658595AC2A63132332020202020202020202020202020202020
length=13 klength=11
i=1 char=A 41202020
i=2 char=B 42202020
i=3 char=C 43202020
i=4 char=Â¦ C2A62020
i=5 char=X 58202020
i=6 char=Y 59202020
i=7 char=Z 5A202020
i=8 char=Â¦ C2A62020
i=9 char=1 31202020
i=10 char=2 32202020
i=11 char=3 33202020
NOTE: 1 record was read from the infile 'c:\downloads\bar.txt'.
      The minimum record length was 13.
      The maximum record length was 13.
NOTE: The data set WORK.TEST has 1 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

Tom · Posted 06-07-2023 11:20 AM

If you know that is the only goofy non-ascii character you have to deal with you might just use TRANSLATE() to replace them with a normal ASCII character instead.

/* Start of your data step */
* Add ENCODING and TRUNCOVER options to the INFILE statement ;
  infile dummy filevar=location end=done truncover encoding='any';
* Read the line and convert the character ;
  input @;
  _infile_=translate(_infile_,'7C'x,'A6'x);
/*  rest of your data step */

jimbobob · Posted 06-07-2023 11:37 AM

You're the man @Tom, worked like a charm. thank you. And thanks @Kurt_Bremser and @ballardw for your help as well.

SAS Training: Just a Click Away