BookmarkSubscribeRSS Feed
MarkWik
Quartz | Level 8

Hi, I would like to understand the differences between the behavior of informats as opposed to explicitly defining length in a length statement when reading a values in  a raw data file. For example, let's say i am reading a CSV file, applying all informats in an informat statement will help me not worry about colon modifier right? So if I apply both the length statement and informats, would this affect the result?

data want;

infile "filename.csv" dsd missover;

informat column1 informat.......columnN informat;

input column1.....columnN;

run;

And suppose if i miss to specifiy an informat to a particular variable in the informat statement, would this change the order of the variables?

Please help me understand. Thanks

5 REPLIES 5
jakarman
Barite | Level 11

You are using a dedicated type of raw data. that dsd.  That one behaves with ignoring the length specifications of the given formats. When lengths (char) a needed you must specify those. Would you use raw data in a classic column oriented way that the lengths of formats are not ignored.


The order of variables is a weird thing where all dependicies on the logical order should be avoided. the physical order has been changed with v8 I believe by combining the numerics and chars on word boundary where it was on bytes. The reason is that the technical architecture of CPU and IO has drastically changed by time.  The logical order of sas variables is concept of the 70's. It says a variable gets defined as soon the language compiler detect it. By that ordering of variables is influenced by coding. 

---->-- ja karman --<-----
MarkWik
Quartz | Level 8

Thank you for your response. Why the following program shows invalid data for name in log and writes missing values:

data q;

infile datalines dsd missover;

informat name   date ddmmyy10. salary;

input name $ date salary;

datalines;

john,20061983,234

jacob,21061983,235

keith,12081976,678

steve,29101979,670

mark,28111988,236

;

mingfeng07
Fluorite | Level 6

The variable name is characteristic, so you'd better use '$5.' behind the "informat name", and the variable date's values  seems not reasonable, so the informat "ddmmyy10." should be deleted, if you want to convert it to the format of date, you can use this statement "format  date ddmmyy10.;" after the input statement.

Cynthia_sas
SAS Super FREQ

Hi:

  When you have this in your INFORMAT statement:

informat name   date ddmmyy10. salary;

That means you are defining BOTH NAME and DATE as numeric variables with an INFORMAT of DDMMyy10. Then, the INPUT statement does try to read NAME as character, but too late -- you're already created it as numeric. If you run a PROC CONTENTS after your DATA step, you will see that you have created all 3 variables, NAME, DATE and SALARY as numeric variables.

cynthia

Tom
Super User Tom
Super User

If you want to define your variables use a LENGTH or ATTRIB statement.


The INFORMAT statement just tells SAS what default INFORMAT to use when the variable is referenced on an INPUT statement.  It is only as a side effect that it will set the type and length for a variable and then only if the variable has not been previously defined in the data step.  So if you attach a numeric informat then the variable will become type num and length of 8.  If you specify a character informat for a previously unknown variable then the type will be char and length will be the length used in the INFORMAT (or default length for that informat).  And depending on the INFORMAT this default length could be wrong.  Consider this example.  The variable ITEM2 is defaulting to length $1 and so it does not have room to store the actual values like 'Apple',


proc format ;

  invalue $example 'A'='Apple' 'B'='Banana' 'C'='Carrot' other='Unknown';

run;

data xx;

  length item1 $20;

  informat item1 item2 $example1.;

  input item1 @1 item2;

cards;

A

B

C

D

;;;;

proc print;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 630 views
  • 0 likes
  • 5 in conversation