The following SAS program is submitted:
data names;
title='EDU';
if title='EDU' then
Division='Education';
else if title='HR' then
Division='Human Resources';
else Division='Unknown';
run;
Which one of the following represents the value of the variable Division in the output data set?
a.Educatio
b.Education
c.Human Re
d.Human Resources
This is one of the SAS Base Programming practice questions from SAS. The correct answer is b. The explanation said
"The length of the variable Division is set to 9 when the DATA step compiles". I cannot see why is it set to 9 bytes when the DATA step compiles. There is no explicit LENGTH statement and the default length for a character variable is 8. Anyone can help? Thank you!
For added fun check the length of Division in this case.
data names; title='EDU'; if title='EDU' then Division=cats('Education',''); else if title='HR' then Division='Human Resources'; else Division='Unknown'; run;
I really wouldn't consider this a solution. More pointing out that how you create the variable can also lead to unexpected lengths, in this case 200, for a variable.
For character variables, the SAS DATA step compiler assigns lengths in this order:
an explicit LENGTH statement
the first character string it encounters
the default of 8.
Use this search to get some useful articles on the topic
sas data step compiler.
Doc Muhlbaier
Duke
To my knowledge a SAS character variable only defaults to a length of 8 when you define it in an input statement without a specific length definition. Something like: INPUT myvar $;
With SAS you don't have to explicitely declare variables in a data step. The SAS compiler simply processes the run group (the data step) once from top to bottom and creates all the variables for the PDV. It uses the first variable definition it finds.
In your case the first time variable Division gets mentioned, is in the statement Division='Education'; Here the assigned string Education consists of 9 characters and that's why SAS creates the character variable Division with a length of 9.
So yes, it's great that SAS does all these things implicitely for us (unlike in other languages) but it also leads sometimes to a lot of confusion. I still remember mine many many year ago where I couldn't understand why some strings got truncated (=consecutive strings were longer but as the variable had already been defined with a specific length these strings got truncated).
This top-to-bottom, first comes first approach is even true when you're explicitely using a declarative Length statement. At least in such a case SAS throws a warning.
15 data test; 16 a='abc'; 17 length a $10; WARNING: Length of character variable a has already been set. Use the LENGTH statement as the very first statement in the DATA STEP to declare the length of a character variable.
Thank you so much for your detailed answer. The solution gives me more insight.
For added fun check the length of Division in this case.
data names; title='EDU'; if title='EDU' then Division=cats('Education',''); else if title='HR' then Division='Human Resources'; else Division='Unknown'; run;
I really wouldn't consider this a solution. More pointing out that how you create the variable can also lead to unexpected lengths, in this case 200, for a variable.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.