I have been using SAS for about 7 years now and recently came up with an issue setting an implicit length of a variable.
We are using SAS version 9.4 (TS1M5).
The following code spreads out the selection of data from various states over 4 weeks within a quarter.
The state_string variable is defined in this data step.
Up until recently we only had two states per line so that all the state_string defines were the same length. SAS defined the length based upon the string length.
The recent change was when we added a third state in the last pull (at the end of the quarter.)
When I used the state_string variable in a proc sql statement as a where condition, the state_string value was truncated to the length of the first time it was referenced. This caused it to miss a few characters and not close the string properly.
My issue is that this is within an If-then-else block so that only one definition of state_string should occur each time it is run. It is either week 4 or not so there should only be one definition of state_string.
Can someone explain why SAS is picking the first instance of the define from within an if clause rather than picking the version that applies for this test case? For instance in weeks 13, 26, 39 and 52 the longer string would be selected and the string lengh should be longer.
I have fixed the issue by adding an explicit length declaration rather than allowing SAS to define it. I am just curious as to why SAS is using definitions within an if block that don't apply to this instance.
This is easy to test by just finding out what week this is and putting that number in one of the if condition blocks to trigger a particular definition to fire.
/* calculate what week this is and select what states will be searched in the next step */
data what_week;
/* MY FIX IS THE NEXT LINE - BUT WONDERING WHY IT IS NECESSARY */
length state_string $ 100;
/* offset today() by additional days to test all options */
thisDate = today() + 0;
showthisdate = put(thisDate, date11.);
thisweek = week(thisDate);
firstweek = week(intnx('YEAR',thisDate,+1,'BEGIN'));
lastweek = week(intnx('YEAR',thisDate,+1,'END'));
thisquarter = qtr(thisDate);
nextquarter = qtr(intnx('QTR',thisDate,+1,'BEGIN') );
if (thisweek in (4,17,30,43 ) ) then
state_string = "license_state_code in ('OH','NE')";
else if (thisweek in (7,20,33,46 ) ) then
state_string = "license_state_code in ('MI','IN') ";
else if (thisweek in (10,23,36,49 ) ) then
state_string = "license_state_code in ('MN','ND') ";
else if (thisweek in (13,26,39,52 ) ) then
state_string = "license_state_code in ('NC','NM','AZ')";
/* don't pick any data if not a selected week */
else state_string = "license_state_code in ('')" ;
call symputx('THIS_WEEK',thisweek);
call symputx('NEXT_QUARTER',nextquarter);
call symputx('state_string',state_string);
run;
%put &=this_week, &=next_quarter, &=state_string ;
The LENGTH is set for a variable when the data step is compiled. If you remove the LENGTH statement then SAS will have to make a GUESS at how to define the variable. So it will use the first place that STATE_STRING appears in your code is this statement:
state_string = "license_state_code in ('OH','NE')";
If you changed the CODE so that the first place STATE_STRING appears it was being assigned a different value then the length that SAS would GUESS to define the variable would change.
It does not matter whether or not that statement ever executes. So it does not matter what value THISWEEK has.
You could have added the new conditions to the END of your IF/THEN/ELSE chain instead of the BEGINNING and the guessed length would not have changed (which might also have caused trouble).
if (thisweek in (13,26,39,52 ) ) then
state_string = "license_state_code in ('NC','NM','AZ')";
else if (thisweek in (4,17,30,43 ) ) then
state_string = "license_state_code in ('OH','NE')";
else if (thisweek in (7,20,33,46 ) ) then
state_string = "license_state_code in ('MI','IN') ";
else if (thisweek in (10,23,36,49 ) ) then
state_string = "license_state_code in ('MN','ND') ";
/* don't pick any data if not a selected week */
else state_string = "license_state_code in ('')" ;
Or just set the lengths of the all of the strings being assigned the same so it does not matter the order they appear in the code.
if (thisweek in (4,17,30,43 ) ) then
state_string = "license_state_code in ('OH','NE') ";
else if (thisweek in (7,20,33,46 ) ) then
state_string = "license_state_code in ('MI','IN') ";
else if (thisweek in (10,23,36,49 ) ) then
state_string = "license_state_code in ('MN','ND') ";
else if (thisweek in (13,26,39,52 ) ) then
state_string = "license_state_code in ('NC','NM','AZ')";
/* don't pick any data if not a selected week */
else
state_string = "license_state_code in ('') ";
The LENGTH is set for a variable when the data step is compiled. If you remove the LENGTH statement then SAS will have to make a GUESS at how to define the variable. So it will use the first place that STATE_STRING appears in your code is this statement:
state_string = "license_state_code in ('OH','NE')";
If you changed the CODE so that the first place STATE_STRING appears it was being assigned a different value then the length that SAS would GUESS to define the variable would change.
It does not matter whether or not that statement ever executes. So it does not matter what value THISWEEK has.
You could have added the new conditions to the END of your IF/THEN/ELSE chain instead of the BEGINNING and the guessed length would not have changed (which might also have caused trouble).
if (thisweek in (13,26,39,52 ) ) then
state_string = "license_state_code in ('NC','NM','AZ')";
else if (thisweek in (4,17,30,43 ) ) then
state_string = "license_state_code in ('OH','NE')";
else if (thisweek in (7,20,33,46 ) ) then
state_string = "license_state_code in ('MI','IN') ";
else if (thisweek in (10,23,36,49 ) ) then
state_string = "license_state_code in ('MN','ND') ";
/* don't pick any data if not a selected week */
else state_string = "license_state_code in ('')" ;
Or just set the lengths of the all of the strings being assigned the same so it does not matter the order they appear in the code.
if (thisweek in (4,17,30,43 ) ) then
state_string = "license_state_code in ('OH','NE') ";
else if (thisweek in (7,20,33,46 ) ) then
state_string = "license_state_code in ('MI','IN') ";
else if (thisweek in (10,23,36,49 ) ) then
state_string = "license_state_code in ('MN','ND') ";
else if (thisweek in (13,26,39,52 ) ) then
state_string = "license_state_code in ('NC','NM','AZ')";
/* don't pick any data if not a selected week */
else
state_string = "license_state_code in ('') ";
SAS creates the program data vector (all the variables) during the compilation phase. It uses the first occurrence of a variable in your code.
You can think of the compilation phase as iteration zero of your data step where SAS inspects your code without executing anything.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.