07-19-2015 07:40 AM
Hello, Which is most efficient and convenient method to write data to external file that is a fixed length file? My need is I have a SAS dataset that 86 variables and I will have write the contents to a CSV file and later do some processing in the CSV file and load that into the database table. I have a created a table in the database and there is defined table structure
1. Does the SAS data requires to be written in CSV specifying the length definitions as defined in the table definitions of the database table?
2. If that's the case, does the length requires to be defined as formats in the put statement following the file statement?
3. what logical record length should I specify?
07-19-2015 09:28 AM
CSV files have no information about the variables (fields, columns) that they contain other than the name that is stored in the header row.
Are you asking how to output the metadata about your SAS dataset to a format that might be useful for someone recreating it in a database system? You could write the information that PROC CONTENTS generates to a SEPARATE CSV file and send the data and metadata together.
In terms of writing a CSV file from SAS it is best to set the LRECL as long as possible. SAS will write variable length records so it does not matter if the defined LRECL is longer than the amount used. Now if your data table has many long character strings then you might have a problem fitting the data within the normal limits, but there are way to deal with that.
FORMATS are useful in writing a CSV file, but only when they make it clear what the data is. For example if you have DATE or TIME data in your SAS dataset then you must use the format when writing otherwise the consumer of the CSV file will not recognize them as dates.
Writing a CSV file from SAS is trivial.
set sashelp.class ;
file 'class.csv' dsd ;
put (_all_) (;
Adding the header row takes a little more work.
07-19-2015 09:46 AM
Thank you Tom,Got it, Nice and clear. Would it be more of an educated rough estimate to set the LRECL?
And if i write the contents to a fixed length external file, do I still need to use formats when the length is defined like var1 1-5 @7 var2 and so on? is the _infile_ variable of any use at all?
The database table that's been created has table definitions like the following, and so my confusion is that the SAS metadata should probably match the table definitions of the database?:
and for header, how about proc export with getnames option, i don't know if i did my google search properly.
07-19-2015 10:08 AM
The current default for SAS uses for FILE statement is 32767, but older versions used 256. I find that for most work 32767 works fine. You can get a data step to write a lrecl with over 1,000,000 without any special coding.
PROC EXPORT works fine for simple exports like this.
For your example table the CHAR datatype will translate exactly to the SAS character type. For the DATE variable you need to know what format they want the date values in the CSV file. Do they want YYYYMMDD, MMDDYYYY, DDMMYYYY, DD-MM-YY , MM/DD/YY, etc.
To READ a file that is fixed column layout for the columns you defined you could just use formatted input without any column locations.
input LAST_NAME $50. FIRST_NAME $50. .... DOB MMDDYY8. ... .EMAIL $50. ;
07-21-2015 11:10 AM
Hi my apologies for the bother again. To write the sas dataset to a CSV file with the above mentioned variables, should i have to mention formats like your example in Input in my put statemetns after a file or your top example with just put _all_ will do, i am getting very confused with so many characters, dates and other numeric variables like salary etc. Please help
07-21-2015 12:56 PM
I am very confused about what your goal is here.
1) Do you want to read a fixed column file?
2) Do you want to write a fixed column file?
3) Do you want to read a delimited file? What is the delimiter?
4) Do you want to write a delimited file? What is the delimiter?
Some combination of these?
To read a fixed column file you are use many methods. You can list each field with an informat of the specified length.
INPUT VAR1 $50. VAR2 $20. ...
You can specify the column ranges
INPUT VAR1 $ 1-50 VAR2 $ 51-70 .... ;
To write a fixed column file you write with explicit lengths.
PUT VAR1 $50. VAR2 $20 ...
Or you can use column pointers to tell where to start a value (just make sure they don't overlap as you will lose information).
put @1 VAR1 @51 VAR2 ...
You NEED to use INFORMATS on input when you need them to convert the text strings in the source file into valid SAS fields. For example when reading dates in a format like YYYYMMDD so that SAS will know that the value is a date. You need FORMAT on output when doing the reverse. Now if your SAS dataset is properly created and has the proper formats permanently attached then when generating the CSV file you do not need to worry about formats. Just like you don't need to worry when doing a PROC PRINT or a PUT statement to display the value.
If you are reading from a fixed column file and writing to a CSV file and do not need to actually use the data at all inside SAS then just read is each field as character. Then you don't need to worry about formats. The text from the input file will just be copied to output file.
infile 'fixed_length_file' truncover;
file 'delimited_file' dsd ;
input var1 $50. var2 $20 ;
if _n_=1 then put 'VAR1,VAR2' ;
put var1 var2 ;