BookmarkSubscribeRSS Feed
There is a bug in SAS that aggressively limits the length of variable names and cripples SAS's ability to interface with any well maintained literally name database.
54 Comments
alecwh22
Fluorite | Level 6
I'm genuinely curious to know how your code relates to an expected column
name width?

SAS didn't support the features I wanted so I dropped it a while back, your
reasons might seem obvious once you say it but right now I'm drawing a
blank.
ballardw
Super User

I will not matter how long the variable name limit is the bits about AUTONAME or created variables with not-quite-expected generated names will always be an issue.

 

As soon as you use a 128 character name and run it through a procedure Proc means/tabulate or what have you with auto-naming of variables the issue returns.

 

Personally I'm anxiously awaiting the programming interface that automagically does what I intend and not what I actually write for code.

Ever since Star Trek: The Next Generation I have believed that the main structural elements of the starships are logic circuits to be able to parse the vague instructions into such precise outcomes when talking with the computers.

Quentin
Super User

@alecwh22 I've got programs that process variable names (i.e. metadata) as data.  So run PROC CONTENTS and output the data and process them, or look up variable names in a dictionary table and process them.  I'm sure in some of my DATA steps I explicitly defined the length of variables that holds variable names to $32.

 

 

Since for my use, extra long variable names would be the exception (just like validvarname=ANY is an exception for me), I like the ability to set system options that will preserve the my current options.

ChrisHemedinger
Community Manager
Status changed to: Under Consideration

I've confirmed with our R&D team that this is indeed in the roadmap for a future release -- something they are working on right now.  As you can imagine, such a change touches almost every part of SAS so it's a high impact change.  In addition to longer variables names, longer table names will also be supported.

 

There will be options (a la VALIDVARNAME and VALIDMEMNAME) to enforce current behavior, so your legacy programs and assumptions can still be kept intact.

 

More details to come, but I wanted you to know that the SAS team has heard you and that support is in-the-works.

mark4
Obsidian | Level 7

To jump on this idea, I'd like to see all possible SAS 'variables' increase quite a bit.  Limiting libraries and proc computab (a mess in and of itself) to 8 characters is also annoying.  When you're working with 100 or so of these things, your naming conventions become so convoluted, you invite mistakes.  

Kurt_Bremser
Super User

Increasing filerefs and librefs to 32 characters makes much more sense than increasing the variable name length, IMO. And is probably much easier accomplished as it does not afford a change in the SAS dataset file layout.

marty_ca
Fluorite | Level 6

Currently working with Python to generate valid variable names from time series spreadsheet data column headings published by the Australian Bureau of Statistics. The code I have written takes the first two characters of each word separated by a space and constructs a valid variable name from the column headings in the spreadsheet. Taking just the first two characters generally keeps the variable length within the 32 character limit that SAS can use.

 

However, the ABS has column headings containing words such as, "Underutilisation rate" and "Unemployment rate" in their Labour Force statistics time series data. Taking the first two characters of each word leads to duplicate variable names! The only solution is to take the first three characters but doing this blows out the variable length to over 32 characters. By the way, Python Pandas has no problems with variable names greater than 32 characters.

 

People are missing the big picture analysis or should I say big data analysis. Being able to automate a process like this is critical in being able to undertake big data analysis.

 

If you're interested you can check out my YouTube channel Python Statistical :

 

https://www.youtube.com/channel/UC29Wfvlnq2GKydqQMtlPeDA

 

 

RW9
Diamond | Level 26
Diamond | Level 26

@marty_ca, SAS provides two areas though, a label and a variable name.  The variable name is used in programming, the variable label is used in outputs.  In this way you have short concise coding items to use, and long descriptive names which define the variable.  For instance, if the variables were all called varX, with a label:
VAR1 "Underutilisation rate"

VAR2 "Unemployment rate"

Then you can program with this simply by:

 

array var{2};
or
sum(of var:);

And then process both of them.  If you put the label information in the variable name, then your code starts to expand:

array var{2} underutilisation_rate unemployment_rate;
or
sum(underutilisation_rate unemployment_rate)

This is just with simple names which still fit in with the current rules on variable naming.  Imagine now if all your variables were 128 characters, and for each code line you had to enter these, your code would increase by 75%.  

Whilst I am all for code being extremely readable, I really don't see how dropping labels in favour of putting all the information in variable names is going to make coding any easier or easier to read, its just going to make it very verbose, I have already seen examples of long names where one character has changed, and it can be very hard to debug such an event or follow it.  And when combined with Excel thinking - transposed tables - which seem all the rage, and named literals - also an Excel method - it just obfuscates and makes code harder to work with.

 

marty_ca
Fluorite | Level 6

The key to what I'm doing is that the long variable name contains key information as to the type of data the variable holds. So for example, if I wanted to pull out all the seasonally and trend adjusted time series estimates, all I need to do is search for all the variables containing "Trend" and "Seasonally". Then I can easily carry out an operation such as "Seasonally" minus "Trend" for all these variables. 

 

I'm sure the method you describe will work but to pull out particular variables from the label information would require more lines of code and isn't as efficient as encapsulating key data information within the variable name.

RW9
Diamond | Level 26
Diamond | Level 26

@marty_ca, but isn't that just a data modelling choice?  Transposed datasets seem to be becoming the norm now, probably due to Excel, but that isn't the best structure in most cases.  I don't know your data or processes so can't really say, but from what you posted you have some id variables, and two results variables.  So from this brief information:
id1    id2     seasonal    trend

...      ...       xyz             xyz

Would seem to be the logical storage method.  And also the simplest then to code from:

data want;
  set have;
  want=seasonal-trend;
run;

You only need to know that there is one variable for seasonal and one for trend, no macro code looking through metadata for occurences of text, or putting metadata into variable names (additional metadata would be in other variables in the data row).  I mean thats just one storage option of the top of my head, and for sparse data (i.e. data which may not always be there) the above could also save you a lot of storage room, consider that:
var1   var2   var3  var4

.         .         1       .

2        .         .        .

 

Normalised with the same info is:

var_no  result

3           1

1           2

Or 4*2 versus 2*2.

 

Anyways, irrespective, SAS has marked it as under consideration, so it will come in at some point and be the default, so us programmers will just have to live with thousands of pages of keep/drop lists for the x number of variables * 128 characters.