data sdata2.amp_static;
infile "&G_IFC_RAW_PENDING_PATH./amps.static.txt" RECFM=N;
retain flag 0;
if flag=0 then do;
input tmp $EBCDIC87. @@;
hihi what does this command "input tmp $EBCDIC87. @@;" means?
Reading from amps.static.txt the variable tmp which is in the format of $EBCDIC87. ??
After it is read in, the format stays as $EBCDIC87. in output data sdata2.amp_static, right?
if I am using python to convert sas dataset into text, do I have to pay particular care to data with this format?
Yes it read the data expecting EBCDIC coding in 87 character chunks.
Whether you need to worry about dealing with the EBCDIC coding depends on the file contents not how you access the file. This code should convert the characters from EBCDIC to whatever is native for your SAS session. If the values were EBCDIC to begin with. I suggest confirming that before processing too many files.
The very fact you have an EBCDIC INFORMAT in your program means you are running SAS on an ASCII computer (Windows or Unix) and are converting an EBCDIC file sourced from a mainframe computer. The SAS dataset stores the EBCDIC character data read from the file into ASCII format. You can confirm that for yourself by viewing the SAS dataset sdata2.amp_static and confirming that the variable tmp contains human-readable text.
Why do you want to use Python here? To convert sdata2.amp_static to some other form?
In your example, the INFILE and INPUT statement work together to read a file (originating from an IBM mainframe as FB 87) of fixed-length 87-byte records.
My preferred method to read such a file is this:
data sdata2.amp_static;
infile "&G_IFC_RAW_PENDING_PATH./amps.static.txt" recfm=f lrecl=87;
retain flag 0;
if flag=0 then do;
input tmp $EBCDIC87.;
I find it easier to read to define the file attributes in the INFILE statement, and not in a "hidden" way in the INPUT.
The variable you read will be automatically defined as character with a length of 87, and without any assigned informats or formats. You can use this variable like any other "simple" character variable when writing it to text.
Run this for reference:
data _null_;
file "~/class" recfm=f lrecl=8;
set sashelp.class;
put name $ebcdic8.;
run;
data class;
infile"~/class" recfm=f lrecl=8;
input name $ebcdic8.;
run;
proc contents data=class;
run;
data sdata2.amp_static;
infile "&G_IFC_RAW_PENDING_PATH./amps.static.txt" RECFM=N;
retain flag 0;
if flag=0 then do;
input tmp $EBCDIC87. @@;
point=index(tmp,'#BEGIN');
if point then do;
call symput ("HEADER_IND", 'Y');
header = substr(tmp,1,86);
point=point+86;
flag=1;
input @point r_dte $EBCDIC1-. @@;
link readdata;
end;
Kurt, thank you for the explanation. So the line ' input tmp $EBCDIC87. @@; ' is not reading tmp into special format $EBCDIC87. and it is going to show just character? Why do we bother doing this? What is it so special in this text file such that we have to apply this?
I have added the rest of the program here. May be it helps to shed some light why it is written the way it is written.
I have added the rest of the program. what exactly are we doing here, the EBCDIC keep appearing. This is giving me doubt on the format of the data of the output dataset sdata
Read this first: $EBCDICw. Informat to see what it does and what it is used for.
The code you posted is NOT the whole program by far; it has at least one unbalanced DO without the corresponding END. A whole step starts with the DATA statement and ends with the RUN statement, so use this as the markers when copy/pasting code. If the program does not have and additional INPUT somewhere, it would run infinitely as it never encounters the end-of-file after it finds the #BEGIN.
Oh Kurt, can't fool you, I will add the rest...a lot more to come
Quick question first, i am trying to test how well Python can convert SAS dataset to text, what kind of sas dataset should I pick for testing? The reason I am asking about EBCDIC is because it looks different and might be causing problem..
Generally what specific items in dataset should I focus on these kinds of testing?
To test a Python conversion of a SAS dataset, you can use any of the datasets provided in SASHELP (CLASS, CARS, BASEBALL, HEART). You may need to make a copy of them outside of SASHELP first, in a place where you have easy read access from Python.
See if your Python code works with all the numbers, especially dates and times, and also check it with UTF characters in character variables if you have to work in a UTF environment.
@HeatherNewton wrote:
Oh Kurt, can't fool you, I will add the rest...a lot more to come
Quick question first, i am trying to test how well Python can convert SAS dataset to text, what kind of sas dataset should I pick for testing? The reason I am asking about EBCDIC is because it looks different and might be causing problem..
Generally what specific items in dataset should I focus on these kinds of testing?
Looking at that PROGRAM is not going to help you figure out how to read a SAS dataset with Python.
That program is a SAS program that is reading a text file and converting the text file into a SAS dataset. Understanding how it works might help you write a Python program that could read the same file (or type of file). Is that what you are trying to do?
Character variables in SAS datasets generally use ASCII codes, even when running on IBM mainframes. When you read EBCDIC from a text file running on a mainframe it is automatically converted to ASCII. But if you try to read the same EBCIDIC text when running on an ASCII computer (unix or Windows these days) then you need to use the $EBCDIC informat to tell SAS to interpret the characters as EBCDIC codes instead of ASCII codes.
Note that a SAS dataset only has two types of variables. Floating point numbers and fixed length character strings. Variables do not "have a format". Instead you can attach a FORMAT to a variable to give SAS special instructions for how to display the values. Similarly you can use an INFORMAT to give SAS special instructions for how to interpret text strings and convert them to values to store in a SAS variable. Numeric informats generate floating point numbers. Numeric formats work on floating point values.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.