I have to construct a dataset to a specification where the offset and size of a column as well as the type of column is pre-determined.
I'm trying to force columns to be a certain size when reading them in as a csv using the following code:
DATA TEST;
INFILE "C:\Exports to read back in\DATA_AS_CSV.csv" DSD dlm = "," FIRSTOBS = 2 LRECL =2000;
INPUT
TENANCY_REFERENCE_NUMBER : $20.
TRANSACTION1_TYPE : $40.
TRANSACTION1_AMOUNT : 7.
TRANSACTION1_DATE : $8.
TRANSACTION1_DESCRIPTION : $40.
TRANSACTION1_CD : $1.
TRANSACTION1_BALANCE : 7.
TRANSACTION1_RESPONSIBILITY : $10.
TRANSACTION1_PERIOD : 2.;
RUN;
The main issue I'm having is that all the numerics are reading in as 8 byte fields even though I've tried to specify them being 7 and 2.
I'm also not sure how to set the offset. So my dataset specification shows that the data set needs to start at the 5th byte, would anyone be able to advise if this is possible in SAS?
Hello @manonlyn,
Numeric variables in SAS under Windows can have lengths between 3 and 8 bytes. If non-integer values are to be stored, 8 bytes are strongly recommended because shorter lengths come with a loss of precision ("bytes" are not "decimal digits"). Even for small integers variable lengths <8 can be risky (see this example).
Unlike numeric informats, character informats such as $20. define the length of the corresponding variable if it hasn't been defined before (e.g. with a LENGTH statement). For numeric variables you'd need to use the LENGTH statement if you really wanted to define a non-standard length (i.e. <8). Also, note that with modified list input (using the colon modifier as you do) the informat lengths have no impact if they don't define the variable length (see above). So, you can read standard numeric values with :1. as well as with :32. (or without any informat for that matter), time values such as 12:34:56 PM with :time5. as well as with :time20. (or with :time., which is :time8.) and you can read arbitrary character values with :$1. as well as with :$123. (or without any informat) if the length of the character variable has been set before.
For the offset use the @n column pointer control between "INPUT" and the first variable name, e.g.
INPUT @5 TENANCY_REFERENCE_NUMBER : $20. ...
to skip the first four bytes of each line.
Hello @manonlyn,
Numeric variables in SAS under Windows can have lengths between 3 and 8 bytes. If non-integer values are to be stored, 8 bytes are strongly recommended because shorter lengths come with a loss of precision ("bytes" are not "decimal digits"). Even for small integers variable lengths <8 can be risky (see this example).
Unlike numeric informats, character informats such as $20. define the length of the corresponding variable if it hasn't been defined before (e.g. with a LENGTH statement). For numeric variables you'd need to use the LENGTH statement if you really wanted to define a non-standard length (i.e. <8). Also, note that with modified list input (using the colon modifier as you do) the informat lengths have no impact if they don't define the variable length (see above). So, you can read standard numeric values with :1. as well as with :32. (or without any informat for that matter), time values such as 12:34:56 PM with :time5. as well as with :time20. (or with :time., which is :time8.) and you can read arbitrary character values with :$1. as well as with :$123. (or without any informat) if the length of the character variable has been set before.
For the offset use the @n column pointer control between "INPUT" and the first variable name, e.g.
INPUT @5 TENANCY_REFERENCE_NUMBER : $20. ...
to skip the first four bytes of each line.
What you give the numeric variables in the input statement is just an informat. This only influences the number of characters to read, but the number variable storage length is still set to the default value of 8. SAS stores numbers in a real format (mantissa, exponent and sign combined into 8 bytes), which allows the storage of very large and very small values, but has a limited precision. If you reduce the length of numeric variables (which is possible), you give up precision, not range.
If your first character value starts at position 5, with 4 blanks before that, you do not need to act on that. The $w. informat discards leading blanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.