Greetings--
I'm having issues reading in some flat files i received, I have very limited knowledge of this format(EBCDIC).
Issue:
The character fields read in fine, however the numeric fields do not read in correctly. If I read them in as character (EBCDICw. format), I receive values, but with many leading 0s and I don't know where the decimal goes.
Source file documentation bullet points:
File format is EBCDIC
End of record limiter is Hex 'OA'
Right justified.
Fixed width fields and have lengths of each field.
If value is negative, first byte will be a sign
If value is positive, the first byte is a space
No value defaults to all zeros
System:
SAS 9.4 on Windows 10.
Code:
data readin;
infile rawfile lrecl=2156 missover dsd truncover recfm=f ;
input
@1 CHAR_VAR1 $EBCDIC13.
@14 NUM_VAR1 IND_CLAIM $EBCDIC13.
;
run;
This produces an error free log.
I attempted using the S370FPD10.2 informat on NUM_VAR1, but then only receive missing values and the following message in the log:
NOTE: Invalid data for NUM_VAR1 in line 20 1174-1183.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
20 òðòññöóó÷óðöõÆÖ@@×ÔÃÁÉÄõðñô@@@@@@@@@@@@ÃÕ@ùñðððñøõóñù÷ðöaññañùùò@ðððððððð@@@@@@@@@@@@@ðððððð
93 ððððð@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
185 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
277 @@@@@{{{öñ@@@@@@ðñó@ñ@ðñaðñaðñðñðððð{{{{@ðððððððððððððððóaðóaòðòñðððð@@@@@ðððððððð@ððððððððð
369 ðððððñññðñóùñôôøø@@@@@@@@@@@@@@@@@@@ðððððððð@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ðððððððððððððð
461 ððððððððððððððððððððððððððð@@@@@@Ä@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ðöañòa
553 òðòñòö@@@@@@@@ðððððððððððððððñðañõaòðòñòððóaðóaòðòñðóaðóaòðòñðöañòaòðòñ@@@@@@ððððððððððððððð
645 ðððððððððððððððððòø@ðððððððððððððððððððððððððððððð@@@@@@@@@@ôóõö÷ðððð@@@@@@@@ÆÃ@@@@@@@@@ðððð
737 ððððððôòóùòñöøðò@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ððððððððððð@@@ððððððððòö@@@@@@@@@@@@@@@@@
I'm hoping all i need is the correct informat for numeric fields. i have tried all S370 formats with no success.
Anybody run into this or have any suggestions on things to try?
First and foremost how were these files transferred to your PC? They should have copied using a BINARY file transfer if FTP was used. Any transfer method that involves conversion from the EBCDIC original will corrupt the data. Secondly, you should have been provided a specification of how the columns are formatted. If so can you post that specification? Typically EBCDIC data files will contain COBOL-style numeric columns like packed decimal or signed packed decimal.
Hi, thanks for the reply.
The files were downloaded over ftp with transfer method set to auto. I changed it to binary, redownloaded the file and no change. Unfortunately I can't post the spec, but i have looked over the specifications area and posted the relevant details above(very sparse details). I do have the data type, length, starting and ending position -- but nothing regarding ebcdic specifications. I can ask for more details, but it will be a while before getting a response.
This SAS Note lists COBOL formats and the SAS S370 equivalent INFORMATs. You could work your way through these.
Share some of the example HEX strings for the values that are causing trouble.
But also make sure you know if the file is FIXED length (like the RECFM=F you used in your INFILE statement) or it is variable length with end of line characters such as LINEFEED that you also mentioned.
First use the LIST statement to look at few of the observations. If there are any non-printable characters (when interpreted as ASCII) then it will show the hex code of the values as two extra lines under each line.
data _null_;
infile rawfile lrecl=2156 recfm=f obs=3;
input;
list;
run;
All of the @ signs in your LOG are probably the SPACES in the original file. The hexcode for a space in EBCDIC is '40'x, which is the ASCII code for the @ character.
For a fixed length file you do not really need to use TRUNCOVER (and you almost never want to use MISSSOVER as it can throw away characters). And you do not want to use the DSD option either.
If still cannot figure out the right informat for the numeric fields then read them with $CHAR informat and attach the $HEX format to them so you can check what they actually contain and text different informats on the values using the INPUT() function.
A list gets me the following results:
Thanks for the explanation on the 40x is ascii for the @ symbol.
I have taken off the truncover/dsd/missover with no change.
I have read them in as char and formatted as hex and i get values like 40F0F0F0F0F0F0F0 and F0F0F0F0F0F0F0F0.
I have tried inputting them using input with different formats with no luck, they convert to missing.
FO is just the digit zero. Looks like you just need to convert the string to ASCII and then read it.
22 data test; 23 string = '40F0F0'x ; 24 format string $hex. ; 25 ascii = input(string,$ebcdic3.); 26 format ascii $quote.; 27 number = input(ascii,32.); 28 put (_all_) (=/); 29 run; string=40F0F0 ascii=" 00" number=0 NOTE: The data set WORK.TEST has 1 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Photographs of TEXT are really hard to program with.
So it looks like LIST statement is not that helpful.
You could read the values in in smaller blocks and print them with the $HEX format to see what they contain. So you could dump the hex codes of the first 500 bytes using a step like this:
data _null_;
infile 'myfile' lrecl=50 recfm=f obs=10 ;
input line $char50. ;
put line $hex100. ;
run;
If in fact the file is just lines of text (no actual packed decimal or other binary fields) then just read all in using $EBCDIC informat and write it back to another file using the $CHAR format. THen try reading the resulting ASCII file.
You may need to use the S370FZDL informat for the numbers.
I am reaching out to the owner of the files for guidance and will eventually follow-up here.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.