BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Hansmuffs
Fluorite | Level 6

Hi,

 

I try to understand when I do need the informat statement. I know that this is defined for reading data but I have create a csv file with 3 rows and after changing the number randomly i can't see any change in my data.

 

proc import datafile="/dwh/home/a12c1d7/Test/Input.csv"
        out=staedte
        dbms=csv
       	replace;
     	getnames=yes;
     	delimiter=';';
run;

data final;
	attrib
		ID 		length=8 	format=8. 	informat=8.
		Stadt 	length=$10.	format=$10.	
	;
	set work.staedte;
run;
		
proc contents data=final;
run;

ID Values are 34865837, 98, 67322491.

 

Why is it important to define informat?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

The ATTRIB statement defines/modifies metadata, it doesn't change values and cannot convert variable types. In your example the compiler sees the variable name "amount" for the first time in the ATTRIB statement and based on this first occurrence the compiler decides about the type of that variable. Since COMMAX10.2 is a numeric format, the decision is made: amount is defined as a numeric variable. The informat specification COMMAX10.8 is consistent with that decision (so it doesn't cause an error). Then the compiler realizes that the SET statement tries to bring another variable with the same name (amount) into the program data vector (PDV) and that this is a character variable. Thus a variable type conflict has occurred which cannot be resolved. It is documented by the error message

 

ERROR: Variable amount has been defined as both character and numeric.

and SAS stops processing the DATA step.

 

 

To create a dataset from INFORMAT_DATASET where amount is a numeric variable, a new variable must be created that is assigned the numeric value created from amount by means of the INPUT function using a numeric informat. In addition, renaming is necessary because the new variable must have a different name than the existing variable.

data transform;
set informat_dataset(rename=(amount=amountc));
amount=input(amountc, commax8.);
format amount commax10.2;
drop amountc;
run;

Or similarly:

data transform(drop=amount rename=(amountn=amount));
set informat_dataset;
amountn=input(amount, commax8.);
format amountn commax10.2;
run;

 

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

In this case, and many other cases, the informat does not make a difference and does not need to be defined. In other cases, SAS cannot read the numbers properly and so informats are needed.

 

As long as your input values are integers, the informat is not necessary.

--
Paige Miller
Shmuel
Garnet | Level 18

I'll give you two examples when you will need to use informat:

 

1) Suppose you got a date like '02/15/2021' as a string. Can you sort it by date?

     Of course not because the string: '08/12/2020' will be after the above date though it is earlier.

     The informat mmddyy10. will replace it to an unteger, the number of days since 01/01/1960 (which = 0).

     There are many other informats to deal with date, time, timestamp (dat and time).

 

2) Usually we use formats to display a string (code translation to its meaning).

    Sometimes it is comfort to do the reverse - convert a character or a string to its numeric value.

    Suppose you have a variable with values 'YES' or 'NO'. Convert it by informat to 1 or 0 - then it is easier 

    to check wether it is TRUE (=1) or FALSE (=0);

FreelanceReinh
Jade | Level 19

Hi @Hansmuffs,

 


@Hansmuffs wrote:

I try to understand when I do need the informat statement.

(...)

Why is it important to define informat?


I think for a better understanding we should distinguish between using informats and using an INFORMAT statement. Often informats are specified in the INPUT statement when reading raw data which cannot be read with the default informats (i.e., w.d for numeric and $w. for character values). Reading dates is among the most common use cases.

 

Example:

data test;
length id $5;
input id seqno dat :yymmdd. val :numx. unit :$10.;
format dat ddmmyyp10.;
cards;
00001 1 2021-02-15 3,14 mmol/l
;

Without a date informat (here: YYMMDD.) the raw value "2021-02-15" could not be read (as the SAS date value 22326) into the numeric variable DAT. Similarly, the comma in "3,14" (meant as the numeric value 3.14) requires an informat (such as NUMX.) that interprets the comma as a decimal separator. The purpose of the $10. informat in the above example is to define variable UNIT as a character variable of length 10. Alternatively, this definition could be included in the LENGTH statement (note that "$10" there is not an informat specification and therefore not followed by a period):

length id $5 unit $10;

With this preparation UNIT could be read without an informat specification (as are ID and SEQNO). However, UNIT would then appear as the second variable in the dataset (e.g. in default PROC PRINT output), not in its "natural" position after VAL. The colon modifier in front of the three informat specifications is necessary to perform modified list input.

 

All informat specifications could be moved to an INFORMAT statement:

informat dat yymmdd. val numx. unit $10.;
input id seqno dat val unit;

The consequences are:

  1. Again, the variable order in the dataset is affected. For example, SEQNO is now the last variable because it's mentioned after all other variable names.
  2. Most importantly, the three informats are "permanently" associated with the corresponding variables, i.e., they are now part of the metadata, as can be seen in PROC CONTENTS output (excerpt):
          Alphabetic List of Variables and Attributes
    
    #    Variable    Type    Len    Format        Informat
    
    2    dat         Num       8    DDMMYYP10.    YYMMDD.
    1    id          Char      5
    5    seqno       Num       8
    4    unit        Char     10                  $10.
    3    val         Num       8                  NUMX.
    

 

In many if not most cases these permanent informat metadata are unnecessary and often annoying. For instance, they will show up as "differing attributes" in PROC COMPARE output if the compared dataset does not have them. If one of the informats was user-defined, an error message "ERROR: The informat <informat name> was not found or could not be loaded." would occur when working with dataset TEST in a SAS session without access to the informat definition. A common source of such informat metadata is PROC IMPORT (using INFORMAT statements behind the scenes).

 

This is why I rarely use INFORMAT statements and rather remove existing informat metadata (by means of PROC DATASETS).

 

Here are two examples where an INFORMAT statement and permanently associated informats do make (some) sense:

  1. Convenient assignment of informat specifications to a list of variables which are not consecutive in the raw data to be read:
    data test1;
    input dat1 val1 dat2 val2 dat3 val3 dat4 val4 dat5 val5;
    informat dat: yymmdd.;
    format dat: ddmmyyp10.;
    cards;
    20210111 2 20210120 3 20210203 5 20210210 7 20210215 11
    ;
    Note that the colon here (to define a name prefix variable list) has nothing to do with the colon related to modified list input in the first DATA step.
  2. Using a "template dataset" to provide informat specifications (and other metadata like formats or variable labels) for a new dataset with the same structure:
    data test2;
    if 0 then set test1;
    input dat1 val1 dat2 val2 dat3 val3 dat4 val4 dat5 val5;
    cards;
    20210303 13 20210430 17 20210513 19 20210608 23 20210719 29
    ;
    This technique simplifies the code and avoids repetitions, especially if many variables with various formats, informats, etc. are involved (more than in this example).
Hansmuffs
Fluorite | Level 6

Okay this is really an excellent explanation. 

I have a last example. 

 

data informat_dataset;
	amount= '25,25000';
run;

data transform;
	attrib
		amount format=commax10.2 informat=commax10.8
	;
	set informat_dataset;
run;

When i execute this, I get the log error...

 

ERROR: Variable amtcurrency has been defined as both character and numeric.

 

What I have to change that the transformation regarding to format and informat will be executed?

 

Thanks and regards

Hans

FreelanceReinh
Jade | Level 19

The ATTRIB statement defines/modifies metadata, it doesn't change values and cannot convert variable types. In your example the compiler sees the variable name "amount" for the first time in the ATTRIB statement and based on this first occurrence the compiler decides about the type of that variable. Since COMMAX10.2 is a numeric format, the decision is made: amount is defined as a numeric variable. The informat specification COMMAX10.8 is consistent with that decision (so it doesn't cause an error). Then the compiler realizes that the SET statement tries to bring another variable with the same name (amount) into the program data vector (PDV) and that this is a character variable. Thus a variable type conflict has occurred which cannot be resolved. It is documented by the error message

 

ERROR: Variable amount has been defined as both character and numeric.

and SAS stops processing the DATA step.

 

 

To create a dataset from INFORMAT_DATASET where amount is a numeric variable, a new variable must be created that is assigned the numeric value created from amount by means of the INPUT function using a numeric informat. In addition, renaming is necessary because the new variable must have a different name than the existing variable.

data transform;
set informat_dataset(rename=(amount=amountc));
amount=input(amountc, commax8.);
format amount commax10.2;
drop amountc;
run;

Or similarly:

data transform(drop=amount rename=(amountn=amount));
set informat_dataset;
amountn=input(amount, commax8.);
format amountn commax10.2;
run;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1366 views
  • 1 like
  • 4 in conversation