BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
qaguy1982
Obsidian | Level 7

I have been using SAS for about 7 years now and recently came up with an issue setting an implicit length of a variable.

We are using SAS version 9.4  (TS1M5).

 

The following code spreads out the selection of data from various states over 4 weeks within a quarter. 

The state_string variable is defined in this data step.  

Up until recently we only had two states per line so that all the state_string defines were the same length.  SAS defined the length based upon the string length. 

The recent change was when we added a third state in the last pull (at the end of the quarter.) 

When I used the state_string variable in a proc sql statement as a where condition, the state_string value was truncated to the length of the first time it was referenced.  This caused it to miss a few characters and not close the string properly. 

 

My issue is that this is within an If-then-else block so that only one definition of state_string should occur each time it is run.   It is either week 4 or not so there should only be one definition of state_string.

 

Can someone explain why SAS is picking the first instance of the define from within an if clause rather than picking the version that applies for this test case?  For instance in weeks 13, 26, 39 and 52 the longer string would be selected and the string lengh should be longer.  

 

I have fixed the issue by adding an explicit length declaration rather than allowing SAS to define it.  I am just curious as to why SAS is using definitions within an if block that don't apply to this instance. 

 

This is easy to test by just finding out what week this is and putting that number in one of the if condition blocks to trigger a particular definition to fire. 

 


/* calculate what week this is and select what states will be searched in the next step */
data what_week;
/* MY FIX IS THE NEXT LINE - BUT WONDERING WHY IT IS NECESSARY */ length state_string $ 100; /* offset today() by additional days to test all options */ thisDate = today() + 0; showthisdate = put(thisDate, date11.); thisweek = week(thisDate); firstweek = week(intnx('YEAR',thisDate,+1,'BEGIN')); lastweek = week(intnx('YEAR',thisDate,+1,'END')); thisquarter = qtr(thisDate); nextquarter = qtr(intnx('QTR',thisDate,+1,'BEGIN') ); if (thisweek in (4,17,30,43 ) ) then state_string = "license_state_code in ('OH','NE')"; else if (thisweek in (7,20,33,46 ) ) then state_string = "license_state_code in ('MI','IN') "; else if (thisweek in (10,23,36,49 ) ) then state_string = "license_state_code in ('MN','ND') "; else if (thisweek in (13,26,39,52 ) ) then state_string = "license_state_code in ('NC','NM','AZ')"; /* don't pick any data if not a selected week */ else state_string = "license_state_code in ('')" ; call symputx('THIS_WEEK',thisweek); call symputx('NEXT_QUARTER',nextquarter); call symputx('state_string',state_string); run; %put &=this_week, &=next_quarter, &=state_string ;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The LENGTH is set for a variable when the data step is compiled.    If you remove the LENGTH statement then SAS will have to make a GUESS at how to define the variable.  So it will use  the first place that STATE_STRING appears in your code is this statement:

state_string = "license_state_code in ('OH','NE')"; 

If you changed the CODE so that the first place STATE_STRING appears it was being assigned a different value then the length that SAS would GUESS to define the variable would change.

 

It does not matter whether or not that statement ever executes.  So it does not matter what value THISWEEK has.

 

You could have added the new conditions to the END of your IF/THEN/ELSE chain instead of the BEGINNING and the guessed length would not have changed (which might also have caused trouble).

if (thisweek in (13,26,39,52 ) ) then 
state_string = "license_state_code in ('NC','NM','AZ')"; 
else if (thisweek in (4,17,30,43 ) ) then 
state_string = "license_state_code in ('OH','NE')"; 
else if (thisweek in (7,20,33,46 ) ) then 
state_string = "license_state_code in ('MI','IN') "; 
else if (thisweek in (10,23,36,49 ) ) then 
state_string = "license_state_code in ('MN','ND') "; 
/* don't pick any data if not a selected week */
else state_string = "license_state_code in ('')" ;

 

Or just set the lengths of the all of the strings being assigned the same so it does not matter the order they appear in the code.

if (thisweek in (4,17,30,43 ) ) then 
state_string = "license_state_code in ('OH','NE')     "; 
else if (thisweek in (7,20,33,46 ) ) then 
state_string = "license_state_code in ('MI','IN')     "; 
else if (thisweek in (10,23,36,49 ) ) then 
state_string = "license_state_code in ('MN','ND')     "; 
else if (thisweek in (13,26,39,52 ) ) then 
state_string = "license_state_code in ('NC','NM','AZ')"; 
/* don't pick any data if not a selected week */
else 
state_string = "license_state_code in ('')            ";

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

The LENGTH is set for a variable when the data step is compiled.    If you remove the LENGTH statement then SAS will have to make a GUESS at how to define the variable.  So it will use  the first place that STATE_STRING appears in your code is this statement:

state_string = "license_state_code in ('OH','NE')"; 

If you changed the CODE so that the first place STATE_STRING appears it was being assigned a different value then the length that SAS would GUESS to define the variable would change.

 

It does not matter whether or not that statement ever executes.  So it does not matter what value THISWEEK has.

 

You could have added the new conditions to the END of your IF/THEN/ELSE chain instead of the BEGINNING and the guessed length would not have changed (which might also have caused trouble).

if (thisweek in (13,26,39,52 ) ) then 
state_string = "license_state_code in ('NC','NM','AZ')"; 
else if (thisweek in (4,17,30,43 ) ) then 
state_string = "license_state_code in ('OH','NE')"; 
else if (thisweek in (7,20,33,46 ) ) then 
state_string = "license_state_code in ('MI','IN') "; 
else if (thisweek in (10,23,36,49 ) ) then 
state_string = "license_state_code in ('MN','ND') "; 
/* don't pick any data if not a selected week */
else state_string = "license_state_code in ('')" ;

 

Or just set the lengths of the all of the strings being assigned the same so it does not matter the order they appear in the code.

if (thisweek in (4,17,30,43 ) ) then 
state_string = "license_state_code in ('OH','NE')     "; 
else if (thisweek in (7,20,33,46 ) ) then 
state_string = "license_state_code in ('MI','IN')     "; 
else if (thisweek in (10,23,36,49 ) ) then 
state_string = "license_state_code in ('MN','ND')     "; 
else if (thisweek in (13,26,39,52 ) ) then 
state_string = "license_state_code in ('NC','NM','AZ')"; 
/* don't pick any data if not a selected week */
else 
state_string = "license_state_code in ('')            ";
Patrick
Opal | Level 21

SAS creates the program data vector (all the variables) during the compilation phase. It uses the first occurrence of a variable in your code. 

You can think of the compilation phase as iteration zero of your data step where SAS inspects your code without executing anything.

 Flow of Action in the DATA Step

qaguy1982
Obsidian | Level 7
Thanks for the responses. I have a follow-up question.
Given that SAS is "guessing" when defining a new string variable length, why is there then not a warning or error reporting when a new string is too long for the existing field length and that this may cause issues later in the program?
For instance : WARNING: Character expression will be truncated when assigned to character column xxx. - This warning occurs frequently in some programs but was not reported in this case. Since the actual string used for this particular instance was too long for the "guessed" value, I would have expected a similar error when the code runs.
Or perhaps "WARNING: Default string length is not large enough to hold string "this is a very long string". Consider explicitly defining the length of the field to accommodate the maximum size needed."
I suspect that those who work in pure SAS are more used to explicitly defining the length of variables. Most of my work pulls data from Oracle databases using pass-through Proc SQL, where the field lengths are already defined and SAS apparently picks them up from Oracle. I rarely define an entirely new field that is not based upon an Oracle field or transformation of an existing data field.
We did some testing locally using PROC SQL and found that PROC SQL using a CASE statement to define a new text string, SAS will pick the length needed to satisfy the longest string in the CODE. So it depends on what functions are used to determine what the implicit length becomes.
Carl

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 671 views
  • 2 likes
  • 3 in conversation