BookmarkSubscribeRSS Feed
Shwethav
Calcite | Level 5

Hi All,

My requirement here is ,I am having very large flat/text file from which i need to get variable names and its positions which i dont know what that falt files consists of..i mean i need table structure with its column names and its attributes. i tried with proc import/wizard and infile as well but its not working could you please help me.

9 REPLIES 9
Cynthia_sas
Diamond | Level 26
Hi:
If you are dealing with a comma delimited file or a tab delimited file PROC IMPORT will handle the column names automatically if they are in the first row. However, if you are dealing with a fixed format file without column headers, then it is nearly impossible to "guess" what the columns should be. The answer to your question depends very much on what your flat file actually looks like. My advice to my students is to always request a "data dictionary" or file layout for any flat files they need to read. They need to understand the data and what represents valid data and they need to understand the contents of the data file.
Can you post some of the data (even a small amount) in a code window so that people here could take a look at it and possibly understand why PROC IMPORT didn't work. Did you get error messages from PROC IMPORT or did your output file not contain what you thought it should contain?
Cynthia
Sajid01
Meteorite | Level 14

Hello @Shwethav 
Under similar circumstances I would try to open the file in a spreadsheet and get an understanding of what is there in the file and possibly  make necessary changes save as an excel workbook (.xlsx) and then try to import it.

If the file is very big, , I would use Unix/Linux shell to extract the first few rows  and then examine it.

I would like to remind that data preparation is an important step in any analytics projects and one should allocate time for it for in scenarios like this.

Shwethav
Calcite | Level 5

This is the code i have used to read the flatfile

PROC IMPORT
DATAFILE=".........\Documents\SEPA\lhbml430.txt"
OUT=sepa
DBMS=dlm
REPLACE;
GETNAMES=no;
delimiter='7F'x;
RUN;

There is neither ERROR message nor output generated.its only throwing the note message as below:

Shwethav_0-1624545039605.png

 

Am having the program template to read this flat file data in mainframes but only selected variables are mentioned in that program,my task here is to get another 5 more variables in to the output from this input(flat file)for which i dont know those variable position's to bring the data for it :.

I have FTP 'd the flat file from the mainframes to my local system, but unable to view it to get my required columns positions.

Hence i just want to know the columns positions/structure of flat file not importing the data from it.

 

 

Tom
Super User Tom
Super User

First step is fix your unbalanced quotes.  It looks from the photograph of the text in your SAS log (why did you post text as a picture?) that you are using some interface to SAS, like Enterprise Guide or SAS/Studio.  Restart your SAS session (you should be able to do this without closing the program you are using to interface to SAS) so that you are starting without any user submitted code.  That should fix the unbalanced quotes that are generating those NOTEs in the log.

 

Second is take a look the file you have gotten.  You can just use a simple SAS dataset to look at the first few lines.

data _null_;
  infile ".........\Documents\SEPA\lhbml430.txt" obs=10;
  input;
  list;
run;

Make sure that it looks right.  If your "mainframe" is an actual IBM mainframe then you also should make sure you have copied the file properly.  If the fields in the file contain binary data (like packed decimal format) then make sure to transfer the file as binary and not text.  But this means if some of the fields are text you might need to use the $EBCDIC informat to read them.

 

 

Kurt_Bremser
Super User

First, fix the unbalanced quotes by restarting the SAS server session.

 

Next, take a good look at the file on the mainframe to determine if it contains only text or some binary data. Lots of MF software will write numbers in binary or packed decimal format (PD = 2 decimal digits per byte).

If it is pure text, you can use FTP/SFTP in text mode, if not, transfer the file in binary mode.

 

If you used binary, use the $EBCDIC informat for text, and the S370... informats for the numbers (S370FPD5.2 will read 9 digits, 2 of which are the decimal fraction, the last 4 bits contain the sign). Also note that you will probably have a file with Fixed Block organization, which means no line separators and reading with RECFM=F.

 

If the file was transmitted as text, use a text editor (Notepad++) to inspect it. This has a hex mode which allows you to verify if the delimiter is in fact '7F'x.

Kurt_Bremser
Super User

And: getting files from mainframes means there IS documentation for it. MF programmers live that way. They eat, drink, breathe and secrete documentation. The least thing you can get is a PL/1 "include" that contains the code for setting up the record structure.

Sajid01
Meteorite | Level 14

Please  have a a clean restart of the SAS EG and replace the delimiter as shown.
It should work

delimiter='7F'x
with 
delimiter='09'x
ballardw
Super User

Open the file in text editor that doesn't wrap long lines, copy the first 10 lines. On the forum open a text box using the </> icon. Paste the text.

 

That way we have something to see an may have more to work with.

Import expects exactly one row or line to hold the column headings and the first one at that. If your file is such that the "column header" information is on two or more rows then your file won't be read "correctly" by Import.

Kurt_Bremser
Super User

I can only second what @Cynthia_sas said: you MUST get documentation for files that you are given for import into SAS. This documentation (or "data dictionary") has to include the technical layout (fixed vs. variable records, are there CR and/or LF separators, are there delimiters, or do the columns have fixed widths) and a description for each column.

Without that, it is guesswork; start with opening the file in a suitable text editor (I prefer Notepad++). From there you may be able to get the column names, if they are included. The presence of delimiters may make it possible to let PROC IMPORT make its own guesses, which may even be close to the intended reality. Or the layout is simple and obvious (e.g. 2021-05-06 is clearly a date) so that you can write proper code right away.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3776 views
  • 2 likes
  • 6 in conversation