Help using Base SAS procedures

Using informat when reading from other dataset with a SET statement

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Using informat when reading from other dataset with a SET statement

Hi!

I know informat is using for reading raw data. How can I read data from other datasets (normally with a set statement) using a custom informat? I am looking for a permanent modification of the data, not just formatting the values. Also why predefined informats work with the set statement as seen below?

Here is an example:

 

data test;
  input CC $2.;
  datalines;
LU
AT
BE
;run;

proc format ;
  invalue $tform 
    "LU"  = "XX"
    other = "ZZ";
run;

/* 
Output from a:
Expected  Actual
XX        LU
ZZ        AT
ZZ        BE
 */
data a;
  informat CC $tform.; /* custom informat does not work? */
  set test;
run;

/* What about predefined informat?
Output from b:
Expected  Actual
LU        L
AT        A
BE        B
 */
data b;
  informat CC $1.;    /* predefined informat works? */
  set test;
run;

 


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 6,778

Re: Using informat when reading from other dataset with a SET statement

[ Edited ]

A couple of things that work ...   First run PROC FORMAT.  Then:

 

data test;
  input CC $tform.;
datalines;

LU

AT

BE

;

 

Or:

 

data test;
  input CC $2.;
datalines;

LU

AT

BE

;

 

data want;

set test;

CC = input(CC, $tform.);

run;

View solution in original post


All Replies
Super User
Super User
Posts: 9,599

Re: Using informat when reading from other dataset with a SET statement

??

 

An informat is for reading raw data in a certain way.  Datasets already have formats attached to them, to alter that you change the format of the variable?

data a;
  set test;
  format CC $tform.; 
run;
Solution
2 weeks ago
Super User
Posts: 6,778

Re: Using informat when reading from other dataset with a SET statement

[ Edited ]

A couple of things that work ...   First run PROC FORMAT.  Then:

 

data test;
  input CC $tform.;
datalines;

LU

AT

BE

;

 

Or:

 

data test;
  input CC $2.;
datalines;

LU

AT

BE

;

 

data want;

set test;

CC = input(CC, $tform.);

run;

Occasional Contributor
Posts: 15

Re: Using informat when reading from other dataset with a SET statement

Posted in reply to Astounding

Thank you for clarifying answers and explanations. I am still battling with the basics.

I knew I am using INFORMAT incorrectly, just did not know how exactly.

 

Here is the part I was looking for:


@Astounding wrote:

 

data want;

set test;

CC = input(CC, $tform.);

run;


I.e. using put or input function to rewrite existing data. I wrongly thought I could achieve the same thing with informat. I did not want to use format, as it only changes the visual appearance of the data, not the data itself.

Thanks all!

Super User
Super User
Posts: 8,114

Re: Using informat when reading from other dataset with a SET statement

[ Edited ]

An INFORMAT converts text to stored values. You use it with an INPUT statement or an INPUT() function.

A FORMAT converts stored values to text. You use it with a PUT statement or a PUT() function.

 

In your case since you are translating character variables you could either use an INFORMAT with an INPUT() function call or a FORMAT with a PUT() function call.

 

In PROC FORMAT you use a VALUE statement to create a FORMAT and an INVALUE statement to create an INFORMAT.

Super User
Posts: 10,258

Re: Using informat when reading from other dataset with a SET statement

The informat does not "work" here:

data b;
  informat CC $1.;    /* predefined informat works? */
  set test;
run;

as you rightfully stated, it only works when reading from raw data with an input statement (or in an input function).

Since here the informat statement sets the attributes before the incoming variables are determined by the set statement, CC is defined with a length of 1, and the values are truncated.

Now run this:

data a;
  informat CC $tform1.;
  set test;
run;

Note that both data steps cause a WARNING because of the truncation. Maxim 2: read the log!

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 8,114

Re: Using informat when reading from other dataset with a SET statement

Changing the INFORMAT associated with a dataset really does nothing.  Perhaps if you opened the dataset in FSEDIT and tried to manually add records it might have an impact.

 

You state that this "works".

data b;
  informat CC $1.;    /* predefined informat works? */
  set test;
run;

What you have done is forced SAS to define the variable CC as character with a length on one byte.  Then when it reads in the existing dataset it truncates the data to fit.

 

Remember that FORMAT and INFORMAT are statements are instructions to SAS about what default format or informat to use when translating values to text or the reverse.  An INFORMAT or FORMAT statement only has an impact on the definition of a variable's type and/or length if it is the first place the SAS code references the variable.  In general SAS makes a decision about the variable type/length when you first reference the variable.  So if the first place you reference it is in a INFORMAT statement then SAS will define the type of the variable to match the informat type.  And for character variables it will set the length to match the width of the format.

 

If you want to set the type and/or length of a variable you should use either a LENGTH or ATTRIB statement.

* Truncate CC to one character ;
data b;
  length CC $1;
  set test;
run;

Also note that letting SAS guess at how to define a variable can result in lengths that are probably not what you intended.  For example if you define a character format that formats one letter codes to longer descriptions and then use a FORMAT statement before defining the variable SAS will set the length to match the length of the display values instead of the actual stored values.

proc format ;
value $testf 'L'='LONG' ;
run;

data test;
  length a $1 ;
  format a b $testf. ;
run;

proc contents data=test;
run;
The CONTENTS Procedure

Alphabetic List of Variables and Attributes

#    Variable    Type    Len    Format

1    a           Char      1    $TESTF.
2    b           Char      4    $TESTF.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 141 views
  • 4 likes
  • 5 in conversation