BookmarkSubscribeRSS Feed
There is a bug in SAS that aggressively limits the length of variable names and cripples SAS's ability to interface with any well maintained literally name database.
54 Comments
Kurt_Bremser
Super User

There is another aspect to this whole discussion.

In compiled languages, variable names vanish as soon as the code is compiled, and are replaced by memory addresses. That makes the maximum length of variable names just a factor for the compiler, but not for anything that happens at runtime.

SAS, as an interpreting language, needs to reserve space for the variable names at runtime. Right now, that means that a numeric variable needs 48 bytes (8 for the data, 32 for the name in the allocation table, and most probably 8 for the memory address). With 128 (or even 256, as some database systems already have), that would extend to 144 (272) bytes, or tripling the memory consumption (with 256, almost increase six-fold).

Consider the same effect happening with short character variables ($1 or $2), where the data itself needs even less space.

For steps working with large non-temporary arrays, this could have serious repercussions.

 

So if any change is made at all, it needs to be made selective per start-up system option, and the default should stay at the current 32 bytes. Anybody who needs more, can activate the feature at will.

Quentin
Super User

Interesting points @Kurt_Bremser.  I wonder if folks felt any pain when SAS went from allowing 8 character variable names (v 6.12) to allowing 32 characters (short-lived v7).  I think v7 was a major rewrite with not only long character names but also ODS.  Would it be hard for SAS to have allowed validvarname=v6 as an option? 

 

I also would not want to use variable names longer than 32 characters.  But I do have sympathy for folks who pull data from databases that allow such long names.

 

If SAS decides to implement such extended variable names, I could live with using ValidVarname as the switch, which can be specified at startup or with an OPTIONS statement.

 

VALIDVARNAME=V7 | UPCASE | ANY | V10

 

Or maybe they should come up with better names than 'V7' or 'V10', since many young folks won't know what v7 is, and might wonder why the default is to use a naming option from a version long gone by. And 'ANY' wouldn't make sense because it's more restrictive than 'V10'.

marty_ca
Fluorite | Level 6

No problems RW9 it's good to have and allows for variable names that are meaningful.  Here is an example below of what I'm doing in Python. I could also throw in underscores to make it more readable, but of course, that would make the variable much longer.

 

Variable generated from column heading                                  Actual column heading in ABS Labour Force spreadsheet Table 22.

_2534yeaUndratproofempPerPerOriPER 25-34 years ; Underemployment ratio (proportion of employed) ; Persons ; Percent Original PERCENT
_2534yeaUndratproofempPerPerSeaAdjPER 25-34 years ; Underemployment ratio (proportion of employed) ; Persons ; Percent Seasonally Adjusted PERCENT
_2534yeaUndratproofempPerPerTrePER 25-34 years ; Underemployment ratio (proportion of employed) ; Persons ; Percent Trend PERCENT
RW9
Diamond | Level 26
Diamond | Level 26

@marty_ca, well one option, standard in my field, is parameter result based.  So lab data, which has thousands of tests, and can have additional metadata such as age, normal ranges or such, is modelled as:

PARAM                  PARAMCD       VALUE    LOWER    UPPER

A long test name   LONGTEST      xyz          abc            abc

 

If we extrapolate that to your data:
PARAM                                                                                           CAT                   PARAMCD        VALUE

25-34 years ; Underemployment ratio (proportion of employed) ;  ORIGINAL         PC2334            x

Persons ; Percent Original PERCENT

25-34 years ; Underemployment ratio (proportion of employed) ;  SEASONAL       PC2334            x

Persons ; Percent Seasonally Adjusted PERCENT

25-34 years ; Underemployment ratio (proportion of employed) ;  TRENS              PC2334

Persons ; Percent Trend PERCENT

 

So that's just done quickly, may not be optimal.  What it does do is provide for an endlessly expandable list of these parameters, combinations of them, calcaultions etc. into rows which, if need be, can be transposed at any time very easily.  So you could add to this:

data temp;
  merge have (where=(cat="SEASONAL" and paramcd="PC2334"))
             have (where=(cat="TRENDS" and paramcd="PC2334") rename=(value=t));
  by ...;
  value=value-t;
  param="Difference of seasonal versus trends";
  cat="COMPUTED";
  paramcd="DIFF2334";
run;

data have;
  set have temp;
run;

This may seem over the top for one or two, but if you have lots of paramters and need to do multiple comparisons, calculations or imputations, then it really comes into it own.  And you can add all kinds of other data to that as well, you could put age range in a separate variable, that way you could do groupings of seasonal v trend data acrosss age groupings, or regions etc.

Kurt_Bremser
Super User

Digression:

_2534yea is the result of bad data modeling. That's clearly a category, should be kept in its own variable and not as part of variable names (transpose wide to long). It might be that other parts of your variable names are also categories (Persons vs what?).

"percent" usually (see later) doesn't need to be in a variable name, as that fact has to be obvious from the way a variable is formatted.

Given that, I'd end up with variable names of underemployment_ratio_original, underemployment_ratio_seasad and underemployment_ratio_trend which

  • fit into 32 characters
  • are easier to read

If I need to put more into the variable names (eg to differ percentages from sums), I can reduce underemployment to underempl, freeing another 7 characters (and add _per and _sum to the end).

marty_ca
Fluorite | Level 6

You are right Kurt, age-group should be a category. The problem is extracting that information from the ABS spreadsheet in the first place. Once it is in Python it would not be hard to create a category for age-group. So what you are seeing is my first step in getting that information into a variable. You would have the same problem in SAS needing to extract from the column headings the age-groups and then create a category for this. Yes, Males, Females and Persons are other categories and list goes on.

 

This is just an example of extracting data from a spreadsheet with a lot of columns and there are better ways such as using the ABS SDMX-JSON API.

mastropi
Fluorite | Level 6

SAS Code to Fix Variable Names to a maximum length

 

I thought I could share this code to help deal with the 32-character limit currently enforced by SAS to variable names.

 

Below is a link to a SAS code I wrote which defines the %FixVarNames macro that can be used to fix the name of a set of variables to a maximum allowed number of characters, and optionally leaving space for potential prefix or suffix to be added to the names (e.g. by running PROC MEANS or other procedures).

Any name collision coming from the truncation are resolved via a number suffix (see example below).

 

https://github.com/mastropi/dmtoolsSAS/tree/master/Standalone

 

The file to download is: FixVarNames-Bundle.sas

 

Here is an usage example taken from the beginning of the code:

%let varnames2fix =
this_is_a_longname_should_be_truncated_1
this_is_a_longname_should_be_truncated_2
this_is_a_longname_should_be_truncated_3
v234567890test
;

* Fix the set of names to a maximum allowed of 32 characters by leaving space for possible prefix or suffix containing 2 characters; * This implies that the variable names are truncated to 30 characters; %let varnames_fixed = %FixVarNames(&varnames2fix, max=32, space=2); %put &varnames_fixed; ** Should show: this_is_a_longname_should_be_1 this_is_a_longname_should_be_2 this_is_a_longname_should_be_t v234567890test ;
buismanjulie
Calcite | Level 5

Is there any update from SAS on this?  Original suggestion was back from 2016 and the latest post says this will be resolved with 9.5.  I'm a new user to SAS and am struggling to bring in an SSMS table that has 33 characters.  I cannot go to our data warehouse team and ask them to rename or make a new view when this happens.  My team is using both SAS and MS Power BI.  If there isn't a work around or a fix to this issue I don't see how I can push my supervisor to continue supporting both analytic tools.  

alecwh22
Fluorite | Level 6
Seriously, this isn't resolved? SAS is relic of the past it desperately needs phasing out
Angel_Boy
Calcite | Level 5

Hello SAS Support,

 

This is almost 4 years later, so checking on this because this is still an issue.

Do we have an update as to when a solution to this 32 character limitation will be rolled out?

 

Thank you.