BookmarkSubscribeRSS Feed
jhealthpolicy
Fluorite | Level 6

Greetings--

 

I'm having issues reading in some flat files i received, I have very limited knowledge of this format(EBCDIC).

 

Issue:

The character fields read in fine, however the numeric fields do not read in correctly. If I read them in as character (EBCDICw. format), I receive values, but with many leading 0s and I don't know where the decimal goes. 

 

Source file documentation bullet points:

File format is EBCDIC

End of record limiter is Hex 'OA'

 Right justified.

 Fixed width fields and have lengths of each field.

 If value is negative, first byte will be a sign

 If value is positive, the first byte is a space

 No value defaults to all zeros

 

System:

SAS 9.4 on Windows 10.

 

Code:

data readin;
infile rawfile lrecl=2156 missover dsd truncover recfm=f ;

input
@1 CHAR_VAR1 $EBCDIC13.
@14 NUM_VAR1  IND_CLAIM $EBCDIC13.

;

run;

 

This produces an error free log.

 

I attempted using the S370FPD10.2 informat on NUM_VAR1, but then only receive missing values and the following message in the log:


NOTE: Invalid data for NUM_VAR1 in line 20 1174-1183.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
20 òðòññöóó÷óðöõÆÖ@@×ÔÃÁÉÄõðñô@@@@@@@@@@@@ÃÕ@ùñðððñøõóñù÷ðöaññañùùò@ðððððððð@@@@@@@@@@@@@ðððððð
93 ððððð@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
185 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
277 @@@@@{{{öñ@@@@@@ðñó@ñ@ðñaðñaðñðñðððð{{{{@ðððððððððððððððóaðóaòðòñðððð@@@@@ðððððððð@ððððððððð
369 ðððððñññðñóùñôôøø@@@@@@@@@@@@@@@@@@@ðððððððð@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ðððððððððððððð
461 ððððððððððððððððððððððððððð@@@@@@Ä@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ðöañòa
553 òðòñòö@@@@@@@@ðððððððððððððððñðañõaòðòñòððóaðóaòðòñðóaðóaòðòñðöañòaòðòñ@@@@@@ððððððððððððððð
645 ðððððððððððððððððòø@ðððððððððððððððððððððððððððððð@@@@@@@@@@ôóõö÷ðððð@@@@@@@@ÆÃ@@@@@@@@@ðððð
737 ððððððôòóùòñöøðò@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ððððððððððð@@@ððððððððòö@@@@@@@@@@@@@@@@@

 

I'm hoping all i need is the correct informat for numeric fields. i have tried all S370 formats with no success. 

Anybody run into this or have any suggestions on things to try?

 

 

15 REPLIES 15
SASKiwi
PROC Star

First and foremost how were these files transferred to your PC? They should have copied using a BINARY file transfer if FTP was used. Any transfer method that involves conversion from the EBCDIC original will corrupt the data. Secondly, you should have been provided a specification of how the columns are formatted. If so can you post that specification? Typically EBCDIC data files will contain COBOL-style numeric columns like packed decimal or signed packed decimal.  

jhealthpolicy
Fluorite | Level 6

Hi, thanks for the reply.  

 

The files were downloaded over ftp with transfer method set to auto. I changed it to binary, redownloaded the file and no change.  Unfortunately I can't post the spec, but i have looked over the specifications area and posted the relevant details above(very sparse details). I do have the data type, length, starting and ending position -- but nothing regarding ebcdic specifications. I can ask for more details, but it will be a while before getting a response. 

SASKiwi
PROC Star

This SAS Note lists COBOL formats and the SAS S370 equivalent INFORMATs. You could work your way through these.

jhealthpolicy
Fluorite | Level 6
I have tried all of the S370 formats with no luck.
Tom
Super User Tom
Super User

Share some of the example HEX strings for the values that are causing trouble.

 

But also make sure you know if the file is FIXED length (like the RECFM=F you used in your INFILE statement) or it is variable length with end of line characters such as LINEFEED that you also mentioned.

Tom
Super User Tom
Super User

First use the LIST statement to look at few of the observations.  If there are any non-printable characters (when interpreted as ASCII) then it will show the hex code of the values as two extra lines under each line.

data _null_;
  infile rawfile lrecl=2156 recfm=f obs=3;
  input;
  list;
run;

All of the @ signs in your LOG are probably the SPACES in the original file.  The hexcode for a space in EBCDIC is '40'x, which is the ASCII code for the @ character.

 

For a fixed length file you do not really need to use TRUNCOVER (and you almost never want to use MISSSOVER as it can throw away characters).  And you do not want to use the DSD option either.

 

If still cannot figure out the right informat for the numeric fields then read them with $CHAR informat and attach the $HEX format to them so you can check what they actually contain and text different informats on the values using the INPUT() function.

jhealthpolicy
Fluorite | Level 6

A list gets me the following results:

jhealthpolicy_0-1689180588890.png

Thanks for the explanation on the 40x is ascii for the @ symbol.  

I have taken off the truncover/dsd/missover with no change. 

 

I have read them in as char and formatted as hex and i get values like 40F0F0F0F0F0F0F0 and F0F0F0F0F0F0F0F0.

I have tried inputting them using input with different formats with no luck, they convert to missing.

 

 


Tom
Super User Tom
Super User

FO is just the digit zero.  Looks like you just need to convert the string to ASCII and then read it.

22   data test;
23     string = '40F0F0'x ;
24     format string $hex. ;
25     ascii = input(string,$ebcdic3.);
26     format ascii $quote.;
27     number = input(ascii,32.);
28     put (_all_) (=/);
29   run;


string=40F0F0
ascii=" 00"
number=0
NOTE: The data set WORK.TEST has 1 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

 

Photographs of TEXT are really hard to program with.

Tom
Super User Tom
Super User

So it looks like LIST statement is not that helpful.

 

You could read the values in in smaller blocks and print them with the $HEX format to see what they contain.  So you could dump the hex codes of the first 500 bytes using a step like this:

data _null_;
  infile 'myfile' lrecl=50 recfm=f  obs=10 ;
  input line $char50. ;
  put line $hex100. ;
run;

If in fact the file is just lines of text (no actual packed decimal or other binary fields) then just read all in using $EBCDIC informat and write it back to another file using the $CHAR format.  THen try reading the resulting ASCII file.

 

 

jhealthpolicy
Fluorite | Level 6
I have tried this one along with all other S370 formats with no luck.
Ksharp
Super User
You need three options to read in EBCDIC file,
But the value of these three options you need to check in SAS Documentation.

infile rawfile dsd truncover recfm=S370 termstr=nl encoding=EBCDIC ;
jhealthpolicy
Fluorite | Level 6
using S370V and these options i am erroring out with 'invalid record detected' unrecoverable io error detected.
jhealthpolicy
Fluorite | Level 6

I am reaching out to the owner of the files for guidance and will eventually follow-up here.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 2433 views
  • 2 likes
  • 5 in conversation