BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jimbobob
Quartz | Level 8

Need help solving a Broken Bar issue when importing data. I have data that imports just fine until it has Broken bars (DLM='a6'x) in the data. In my datastep I'm not using delimiters as its a fixed length import process. When there are none of these Broken Bars the whole highlighted section imports just fine , it should import as one column:

 

jimbobob_1-1686090202758.png

What I get:

jimbobob_2-1686090257680.png

 

	data dsn;
	    length location $1000;
	    set NEW_FILES;
	    location=cats("&path/",FILE_NAME);
	    infile dummy filevar=location end=done;

	    FORMAT
	        ADDENDA_TYPE_CODE $CHAR2.
			PAYMENT_RELATED_INFORMATION $CHAR80.
			ADDENDA_SEQUENCE_NUMBER $CHAR4.
			ENTRY_DETAIL_SEQUENCE_NUMBER $CHAR7.;

	    do while (not done);
			input RECORD_TYPE_CODE $1. @;

			if RECORD_TYPE_CODE = "7" then do;
				input ADD_TYPE_CODE $2. @;
				if ADD_TYPE_CODE = "05" then do;
					INPUT
			        @2     ADDENDA_TYPE_CODE $2.
					@4	   PAYMENT_RELATED_INFORMATION $80.
					@84	   ADDENDA_SEQUENCE_NUMBER $4.
					@88	   ENTRY_DETAIL_SEQUENCE_NUMBER $7.;
				end;
			end;
	        output ;
	    end;
run;

I thought it could be  an encoding issue i tried adding encoding="ANSI" to the code but no luck, when looking at the encoding in notepad++ it says its: 

jimbobob_3-1686090433194.png

 

To note I'm importing this file into SAS studio 5.2 from a  directory mounted onto our unix servers.

 

any help would be appreciated. thanks

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you know that is the only goofy non-ascii character you have to deal with you might just use TRANSLATE() to replace them with a normal ASCII character instead.

/* Start of your data step */
* Add ENCODING and TRUNCOVER options to the INFILE statement ;
  infile dummy filevar=location end=done truncover encoding='any';
* Read the line and convert the character ;
  input @;
  _infile_=translate(_infile_,'7C'x,'A6'x);
/*  rest of your data step */

View solution in original post

13 REPLIES 13
ballardw
Super User

I'm not sure I understand the description. You say ", it should import as one column" but you input statement, coupled with the output, means the data  set is supposed to contain 6 variables and you only show 4.

 

You description does not actually tell me what is wrong in the output, as in what values should appear.

It might help to include some of the source text file as actual text, copy that text instead of the picture open a text box on the forum with the </> icon above the message box and paste the text.

 

Also it might help to show some source text that does not have any of the issue.

 

Is the file supposed to have those characters? If not you might ask the source why they are appearing, especially if it is not consistent.

jimbobob
Quartz | Level 8

Thanks @ballardw  thats is just the section of the code for extracting the NACHA layout part that is known as Addenda, starts with a 7, in most cases its a free form field and in this case the customer used broken bars to sperate components of their additional payment information, and the gray highlighted part of the screenshot I just want in one 80 character field. I cannot upload the actual file as it confidential banking data. Here is a screen shot of another customers addenda that work just fine, the highlighted gets imported properly.

 

jimbobob_0-1686146866320.png

 

Tom
Super User Tom
Super User

Why are you trying to read what is clearly a DELIMITED text file as if it was FIXED POSITION file?

jimbobob
Quartz | Level 8

Thanks @Tom its a fixed width NACHA layout, that's just a short snippet of the data as it confidential banking data that's the most I can screen shot. That part of the layout is known as the Addenda, and in most cases depending on the adenda type, the customer can free form what ever they want in that section, this customer just happened to add payment related info separated by broken bars.

Tom
Super User Tom
Super User

Hard to tell from your photograph, but it looks like the lines are shorter than the more than 80 bytes you are trying to read. Try adding the TRUNCOVER option to the INFILE statement.

 

 

Also why are you attaching the $CHAR display format to the character variables (so that leading spaces are printed), but reading with that data using the $ informat (so that leading spaces are removed)?

Kurt_Bremser
Super User

The seemingly "blank" spaces might be caused by tabs; look if Notepad++ reveals such.

And make sure you don't trip over encoding issues (A6x is part of the ISO-8859/Windows 1252 ASCII table of single-byte characters, but may only be a continuation byte in UTF-8).

jimbobob
Quartz | Level 8

Thanks @Kurt_Bremser  not sure how to do that in notepad++, I selected the option show all characters, but not sure what I should be looking for:

 

jimbobob_0-1686149170110.png

Also when I import this file with my windows desktop EG it imports just fine, is there an option I need to add when importing this on the server? 

jimbobob
Quartz | Level 8

Uploaded sample data that does not import properly in the Addenda section for the customer that has the broken bars in the data versus the other customer that works just fine. Also uploaded full code

jimbobob
Quartz | Level 8

forgot screen shot, the yellow highlighted is working, the other records getting cut off. Also @Tom I did try TRUNCOVER, no change

 

jimbobob_0-1686150634769.png

 

Tom
Super User Tom
Super User

So change this line:

      infile dummy filevar=location end=done encoding='any' truncover;

And add these two lines later:

      do while (not done);
      input @;
      _infile_=translate(_infile_,'7C'x,'A6'x);
      input RECORD_TYPE_CODE $1. @;
Tom
Super User Tom
Super User

A6 is NOT an ASCII character.

19   data _null_;
20    x='a6'x ;
21    put x= x $hex2. ;
22    x='|';
23    put x= x $hex2. ;
24   run;

x=¦ A6
x=| 7C

The hex code for an actual ASCII pipe character is '7C'x.

 

So if your file has non-ASCII characters in it then you need to be very careful about what ENCODING your SAS session is using.   

 

If your SAS session is using a single byte encoding like WLATIN1 then it should read the A6 characters fine.  If you are using some other single byte encoding it will also read fine, but it might display as some other glyph.

 

But if it is use UTF-8 encoding then it would try to convert those 'A6'x characters into 'C2A6'x and that would mess up the column count on your input statements.

 

Example:   So I made this file using WLATIN1 encoding session:

data _null_;
 file 'c:\downloads\bar.txt';
 put 'ABC' 'a6'x 'XYZ' 'a6'x '123' ;
run;

Then read it with a SAS running using UTF-8:  (Notice that copy and paste from the SAS log to this session has caused the non-ascii 2 byte characters to be seen as two separate characters.)

36   data test;
37     infile 'c:\downloads\bar.txt' truncover encoding='wlatin1';
38     input str $30.;
39     length=length(str);
40     klength=klength(str);
41     put str= / str $hex.
42       / length= klength= ;
43     do i=1 to klength;
44       length char $4 ;
45       char=ksubstr(str,i,1);
46       put i= char= char $hex. ;
47     end;
48   run;

NOTE: The infile 'c:\downloads\bar.txt' is:
      Filename=c:\downloads\bar.txt,
      RECFM=V,LRECL=32767,File Size (bytes)=13,
      Last Modified=07Jun2023:11:07:16,
      Create Time=07Jun2023:11:07:16

str=ABC¦XYZ¦123
414243C2A658595AC2A63132332020202020202020202020202020202020
length=13 klength=11
i=1 char=A 41202020
i=2 char=B 42202020
i=3 char=C 43202020
i=4 char=¦ C2A62020
i=5 char=X 58202020
i=6 char=Y 59202020
i=7 char=Z 5A202020
i=8 char=¦ C2A62020
i=9 char=1 31202020
i=10 char=2 32202020
i=11 char=3 33202020
NOTE: 1 record was read from the infile 'c:\downloads\bar.txt'.
      The minimum record length was 13.
      The maximum record length was 13.
NOTE: The data set WORK.TEST has 1 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

Tom_0-1686150895958.png

 

Tom
Super User Tom
Super User

If you know that is the only goofy non-ascii character you have to deal with you might just use TRANSLATE() to replace them with a normal ASCII character instead.

/* Start of your data step */
* Add ENCODING and TRUNCOVER options to the INFILE statement ;
  infile dummy filevar=location end=done truncover encoding='any';
* Read the line and convert the character ;
  input @;
  _infile_=translate(_infile_,'7C'x,'A6'x);
/*  rest of your data step */
jimbobob
Quartz | Level 8

You're the man @Tom, worked like a charm. thank you. And thanks @Kurt_Bremser  and @ballardw for your help as well.

 

jimbobob_0-1686152212215.png

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2307 views
  • 0 likes
  • 4 in conversation