DATA Step, Macro, Functions and more

Compile time statement issues

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Compile time statement issues

Hi as we know length, format, informat are compile time statement in datastep.
In below code :-
Data a;
Input chr $ num;
Length chr $6;
Length num 6;
Format chr $3.;
Format chr $4.;
Run;

It will give num(length=6) and chr(length=8 and format=$4.)
Why it is taking latest format not the first format it encounter ?
Why it is taking latest length in numeric and first length in character?

Look into below code:-
Data test;
Set a1 a2;
Run;

Above for same variables, attribute will be taken by first dataset i.e. a1 . but in earlier example we see that format takes latest value so by that sense it should take attributes of 'a2' dataset??

Accepted Solutions
Solution
‎02-15-2016 03:23 PM
Super User
Posts: 5,518

Re: Compile time statement issues

You're actually asking a question that is more advanced than it seems.  At a simplistic level, the answer is that those are the rules.  At a more advanced level, you're asking about some of the details that go on during DATA step compilation. 

 

As part of the compilation process, SAS has to set up the PDV:  storage locations in memory to hold the current values of every variable.  Once the PDV defines a length for a variable, that length cannot change.  That means since the INPUT statement defines a length for CHR, that length cannot change.  In fact, the later LENGTH statement for CHR should give you an error message to that effect.  However ...

 

Numeric variables always have a length of 8 in the PDV.  So a LENGTH statement for a numeric variable only affects what goes into the output data set, not the PDV.  LENGTH statements for numeric variables follow the opposite pattern ... last length assigned is the one used in the output data set.

 

Some of the results of these ideas:

 

  • To change the length of an EXISTING character variable, the LENGTH statement must come before any other statement that might define the variable's length, such as a SET statement. 
  • To change the length of an EXISTING numeric variable, the LENGTH statement can go anywhere in the DATA step. 

Hoping this helps, rather than confuses ... best of luck.

View solution in original post


All Replies
Solution
‎02-15-2016 03:23 PM
Super User
Posts: 5,518

Re: Compile time statement issues

You're actually asking a question that is more advanced than it seems.  At a simplistic level, the answer is that those are the rules.  At a more advanced level, you're asking about some of the details that go on during DATA step compilation. 

 

As part of the compilation process, SAS has to set up the PDV:  storage locations in memory to hold the current values of every variable.  Once the PDV defines a length for a variable, that length cannot change.  That means since the INPUT statement defines a length for CHR, that length cannot change.  In fact, the later LENGTH statement for CHR should give you an error message to that effect.  However ...

 

Numeric variables always have a length of 8 in the PDV.  So a LENGTH statement for a numeric variable only affects what goes into the output data set, not the PDV.  LENGTH statements for numeric variables follow the opposite pattern ... last length assigned is the one used in the output data set.

 

Some of the results of these ideas:

 

  • To change the length of an EXISTING character variable, the LENGTH statement must come before any other statement that might define the variable's length, such as a SET statement. 
  • To change the length of an EXISTING numeric variable, the LENGTH statement can go anywhere in the DATA step. 

Hoping this helps, rather than confuses ... best of luck.

Contributor
Posts: 24

Re: Compile time statement issues

Posted in reply to Astounding

Great Astounding. I got the point regarding length .

 

I have 1 more doubt.

 

data one;
attrib z length =$20. informat=$7. ;
run;
data two;
input z : $15.;
set one;
datalines;
test
;
run;

 

if i will check  variable attribute of z in dataset two it shows:-

length 15

format $15.

informat $7.

 

Why informat is $7. ?

 

Thanks for giving clear picture regarding length Smiley Happy

Trusted Advisor
Posts: 1,118

Re: Compile time statement issues

@Ps8813: When I run your code with SAS 9.4, variable Z is not assigned format $15. and I wouldn't know where this format should come from, because it has not been specified anywhere in your code.

 

As to informat $7.:

The informat specification :$15. in your INPUT statement does not permanently associate an informat with variable Z. This informat is only used to perform modified list input. So, the first permanent assignment of an informat to variable Z occurs when the header information from dataset ONE is processed by the compiler.

 

Please note that at this point there is already an entry for Z in the PDV from the INPUT statement which precedes the SET statement. Due to the informat specification in the INPUT statement, Z has been created as a character variable with length 15. Now, the compiler detects that in dataset ONE variable Z is contained as a character variable with length 20>15, but the length of Z in the PDV cannot be changed anymore. Therefore, the compiler issues a warning message:

WARNING: Multiple lengths were specified for the variable z by input data set(s). This can cause truncation of data.

(This warning would not occur if the length of Z was <=15 in dataset ONE.)

 

The other variable attributes, here it's only informat $7., are taken from dataset ONE, because they have not been specified yet. Similarly, a permanent format or a variable label would be taken from ONE, unless they had been specified otherwise prior to the SET statement. FORMAT, INFORMAT or LABEL statements after the SET statement could change the corresponding variable attributes again, but a LENGTH statement would not work.

Super User
Posts: 5,518

Re: Compile time statement issues

Posted in reply to FreelanceReinhard

This sounds correct.  To pick out a couple of key points:

 

  • The INPUT statement does not assign any informats.  (Any informats that appear as part of the INPUT statement are instructions for executing the INPUT statement, not permanently-assigned informats.)
  • The INPUT statement can assign lengths to variables that do not yet have a length.  (It cannot change already-assigned lengths.)
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 583 views
  • 7 likes
  • 3 in conversation