Informat and length behavior when reading a raw file?

Reply
Frequent Contributor
Posts: 75

Informat and length behavior when reading a raw file?

Hi, I would like to understand the differences between the behavior of informats as opposed to explicitly defining length in a length statement when reading a values in  a raw data file. For example, let's say i am reading a CSV file, applying all informats in an informat statement will help me not worry about colon modifier right? So if I apply both the length statement and informats, would this affect the result?

data want;

infile "filename.csv" dsd missover;

informat column1 informat.......columnN informat;

input column1.....columnN;

run;

And suppose if i miss to specifiy an informat to a particular variable in the informat statement, would this change the order of the variables?

Please help me understand. Thanks

Valued Guide
Posts: 3,208

Re: Informat and length behavior when reading a raw file?

You are using a dedicated type of raw data. that dsd.  That one behaves with ignoring the length specifications of the given formats. When lengths (char) a needed you must specify those. Would you use raw data in a classic column oriented way that the lengths of formats are not ignored.


The order of variables is a weird thing where all dependicies on the logical order should be avoided. the physical order has been changed with v8 I believe by combining the numerics and chars on word boundary where it was on bytes. The reason is that the technical architecture of CPU and IO has drastically changed by time.  The logical order of sas variables is concept of the 70's. It says a variable gets defined as soon the language compiler detect it. By that ordering of variables is influenced by coding. 

---->-- ja karman --<-----
Frequent Contributor
Posts: 75

Re: Informat and length behavior when reading a raw file?

Thank you for your response. Why the following program shows invalid data for name in log and writes missing values:

data q;

infile datalines dsd missover;

informat name   date ddmmyy10. salary;

input name $ date salary;

datalines;

john,20061983,234

jacob,21061983,235

keith,12081976,678

steve,29101979,670

mark,28111988,236

;

Occasional Contributor
Posts: 5

Re: Informat and length behavior when reading a raw file?

The variable name is characteristic, so you'd better use '$5.' behind the "informat name", and the variable date's values  seems not reasonable, so the informat "ddmmyy10." should be deleted, if you want to convert it to the format of date, you can use this statement "format  date ddmmyy10.;" after the input statement.

SAS Super FREQ
Posts: 8,814

Re: Informat and length behavior when reading a raw file?

Hi:

  When you have this in your INFORMAT statement:

informat name   date ddmmyy10. salary;

That means you are defining BOTH NAME and DATE as numeric variables with an INFORMAT of DDMMyy10. Then, the INPUT statement does try to read NAME as character, but too late -- you're already created it as numeric. If you run a PROC CONTENTS after your DATA step, you will see that you have created all 3 variables, NAME, DATE and SALARY as numeric variables.

cynthia

Super User
Super User
Posts: 6,842

Re: Informat and length behavior when reading a raw file?

If you want to define your variables use a LENGTH or ATTRIB statement.


The INFORMAT statement just tells SAS what default INFORMAT to use when the variable is referenced on an INPUT statement.  It is only as a side effect that it will set the type and length for a variable and then only if the variable has not been previously defined in the data step.  So if you attach a numeric informat then the variable will become type num and length of 8.  If you specify a character informat for a previously unknown variable then the type will be char and length will be the length used in the INFORMAT (or default length for that informat).  And depending on the INFORMAT this default length could be wrong.  Consider this example.  The variable ITEM2 is defaulting to length $1 and so it does not have room to store the actual values like 'Apple',


proc format ;

  invalue $example 'A'='Apple' 'B'='Banana' 'C'='Carrot' other='Unknown';

run;

data xx;

  length item1 $20;

  informat item1 item2 $example1.;

  input item1 @1 item2;

cards;

A

B

C

D

;;;;

proc print;

run;

Ask a Question
Discussion stats
  • 5 replies
  • 283 views
  • 0 likes
  • 5 in conversation