BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
manonlyn
Obsidian | Level 7

I have to construct a dataset to a specification where the offset and size of a column as well as the type of column is pre-determined. 

 

I'm trying to force columns to be a certain size when reading them in as a csv using the following code:

 

DATA TEST;
INFILE "C:\Exports to read back in\DATA_AS_CSV.csv" DSD dlm = "," FIRSTOBS = 2 LRECL =2000;
INPUT
TENANCY_REFERENCE_NUMBER : $20.
TRANSACTION1_TYPE : $40.
TRANSACTION1_AMOUNT : 7.
TRANSACTION1_DATE : $8.
TRANSACTION1_DESCRIPTION : $40.
TRANSACTION1_CD : $1.
TRANSACTION1_BALANCE : 7.
TRANSACTION1_RESPONSIBILITY : $10.
TRANSACTION1_PERIOD : 2.;
RUN;

The main issue I'm having is that all the numerics are reading in as 8 byte fields even though I've tried to specify them being 7 and 2.

 

I'm also not sure how to set the offset. So my dataset specification shows that the data set needs to start at the 5th byte, would anyone be able to advise if this is possible in SAS?

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @manonlyn,

 

Numeric variables in SAS under Windows can have lengths between 3 and 8 bytes. If non-integer values are to be stored, 8 bytes are strongly recommended because shorter lengths come with a loss of precision ("bytes" are not "decimal digits"). Even for small integers variable lengths <8 can be risky (see this example).

 

Unlike numeric informats, character informats such as $20. define the length of the corresponding variable if it hasn't been defined before (e.g. with a LENGTH statement). For numeric variables you'd need to use the LENGTH statement if you really wanted to define a non-standard length (i.e. <8). Also, note that with modified list input (using the colon modifier as you do) the informat lengths have no impact if they don't define the variable length (see above). So, you can read standard numeric values with :1. as well as with :32. (or without any informat for that matter), time values such as 12:34:56 PM with :time5. as well as with :time20. (or with :time., which is :time8.) and you can read arbitrary character values with :$1. as well as with :$123. (or without any informat) if the length of the character variable has been set before.

 

For the offset use the @n column pointer control between "INPUT" and the first variable name, e.g.

INPUT @5
TENANCY_REFERENCE_NUMBER : $20.
...

to skip the first four bytes of each line.

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @manonlyn,

 

Numeric variables in SAS under Windows can have lengths between 3 and 8 bytes. If non-integer values are to be stored, 8 bytes are strongly recommended because shorter lengths come with a loss of precision ("bytes" are not "decimal digits"). Even for small integers variable lengths <8 can be risky (see this example).

 

Unlike numeric informats, character informats such as $20. define the length of the corresponding variable if it hasn't been defined before (e.g. with a LENGTH statement). For numeric variables you'd need to use the LENGTH statement if you really wanted to define a non-standard length (i.e. <8). Also, note that with modified list input (using the colon modifier as you do) the informat lengths have no impact if they don't define the variable length (see above). So, you can read standard numeric values with :1. as well as with :32. (or without any informat for that matter), time values such as 12:34:56 PM with :time5. as well as with :time20. (or with :time., which is :time8.) and you can read arbitrary character values with :$1. as well as with :$123. (or without any informat) if the length of the character variable has been set before.

 

For the offset use the @n column pointer control between "INPUT" and the first variable name, e.g.

INPUT @5
TENANCY_REFERENCE_NUMBER : $20.
...

to skip the first four bytes of each line.

Kurt_Bremser
Super User

What you give the numeric variables in the input statement is just an informat. This only influences the number of characters to read, but the number variable storage length is still set to the default value of 8. SAS stores numbers in a real format (mantissa, exponent and sign combined into 8 bytes), which allows the storage of very large and very small values, but has a limited precision. If you reduce the length of numeric variables (which is possible), you give up precision, not range.

 

If your first character value starts at position 5, with 4 blanks before that, you do not need to act on that. The $w. informat discards leading blanks.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 2568 views
  • 0 likes
  • 3 in conversation