BookmarkSubscribeRSS Feed
There is a bug in SAS that aggressively limits the length of variable names and cripples SAS's ability to interface with any well maintained literally name database.
54 Comments
Tom
Super User
Super User

What is really needed is an enhancement to the functionality of SAS/Access that will allow users to deal with databases with extra long variable and/or dataset names.  They shouldn't have to have administrator rights on the foreign database to be able to read the data.

Some method to allow the user to pass in a mapping that SAS/Access can use to convert between the foreign (long) name and the local (SAS or short) name. 

 

ballardw
Super User

My $0.02

One of the first languages I learned to program in had a limit of 2 characters for variable names: A letter and optionally a numeral for the second postion. Almost no variable was descriptive.

 

Now we have variables that are long enough to be sentences. 

I think I can see having an ability to provide some additional rules to an extract to make the resulting names a bit more useable, especially when the source data base has multiple variables that have the same characters for the first 32 characters such as

CLIENT_WORKING_1_WORKED_SINCE_BIRTH

CLIENT_WORKING_1_WORKED_SINCE_BIRTH_MONTHS

 

Since this particular datas source has over 40 variables that start with CLIENT_ then that part of the variable name is practically meaningless. So if I could have a rule that I could set to eliminate redundant words or to shorten some, it might be helpful. As it is, I export the data to CSV, do a proc import to get a look at what I was given then modify generated code to assign shorter variables and meaningful labels. And pray the data source doesn't change the order the data is exported.

SuzanneDorinski
Lapis Lazuli | Level 10

I'd like to see the the variable name length increased.  For those of you who are happy with the 32 character limit, rock on! 

agoldma
Pyrite | Level 9

This idea was already suggested a year earlier under the idea Metadata: More space please, and it has 18 votes (including mine because I have to work with a very complicated public database)

Even though "Increase variable name length from 32 to 128 characters" is a much better title than "Metadata: More space please", the older idea has more votes. Please vote there.

mastropi
Fluorite | Level 6

@KurtBremser@RW9

 

 

 

Problem: Working with the output dataset becomes more difficult as one needs to know how SAS has modified the name in order to refer to it later in the code. And the way SAS modifies the name depends on the name itself and on the name of the other analyzed variables (e.g. it has to choose a name that doesn't collide with the modified name for another variable).

 

 

 

any good intention of automating a process.

 

 

 

 

 

 

RW9
Diamond | Level 26
Diamond | Level 26

@mastropi, the question you should be asking yourself is why you are having so many variables with a length of nearly 32 characters in the first place.  Variable names are purely for a programming perspective and should be short, concise, and easy to work with.  With that in mind, having suffixes applied is not a problem.  What is a problem however is if one is presented with data which has 32 character length variables, and these need to be typed out each time or worked with, which increases the typing many fold, without adding any benefit, as Labels can hold much more data and are there to be used to identify the variable and display it in a human readable form.  Now I wont labor the point, and SAS is of course free to implement which ever solution they want, but from my side incrementing the limit from 32 to 128 will just cause more issues than it is worth, especially in situations where certain standards have to be preserved - for instance the use of XPT's which have an 8 character limit.  Its a similar problem to the default which SAS rolled out with version 9.  Now the default option in SAS is:

validvarname=any

What this means is, a user can call a variable anything they want - at all.  Causes all kinds of problems when programming, exporting etc.  I can see why they brought it in, with more users going with VA and such like and just plugging data in, they firstly don't need to know variables and underlying code, but also don't want to have to be constricted by underlying mechanics.  However those of us who have to work with the back-end of it, find programming is increased and portability and backwards compatibility is gone.

With regards to generic code, be it macro or otherwise, its worth noting that whatever flexibility you try to include will never be enough or cover all eventualities.  Accept 128 characters, and second time your code runs you will have a user who wants 256 characters.  It always happens.  Far better to address the data modelling head on.  Within your functional design specification for the process, have a section particularly devoted to data modelling and define it well.  Documentation on a process is far more important than the code at the end of day.

mastropi
Fluorite | Level 6

@RW9 Thanks for the prompt reply with comments and suggestions.

 

My comments below on the different parts of your message:

 

Validvarname

I agree with your point about the problems generated by validvarname=any, but in my opinion those problems are of a different nature, having more to do with encodings, a much more complicated issue than variable name lengths.

 

Long names are cumbersome to type

I see your concern about having to type too many characters when referring to a variable in an analysis scenario. Certainly that might be a waste of time. However, I see two ways of circumventing this, the second of which takes advantage of modern editors:

1) You can define macro variables with short names to quickly refer to the most commonly used variables in the analysis.

 

2) Enterprise Guide has an auto-completion feature that suggests matching names as you type.

 

Limits always are finite

I understand that a larger limit will only move the barrier further away... however at some point a limit could be considered "infinite" in practical applications. I think 128 characters (or even 64 chars!) is one such limit.

In fact, I have never come across another software application that has this problem with variable name lengths, be it a database vendors such as Teradata, Oracle, Microsoft SQL Server, ..., be it statistical analysis software such as SPSS, Python, R, Matlab, ...

 

This means that their limits (if any exist) are far more suitable to the requirements of the common daily work in the subject area.

 

Generic code

Regarding your suggestion:

Far better to address the data modelling head on.  Within your functional design specification for the process, have a section particularly devoted to data modelling and define it well. 

I agree that specifications and documentation are very important in code.

 

However, in this scenario, having to specify those limitations (instead of circumventing them by a more complex process that avoids the problem) may not be beneficial. For instance, assuming the process computes statistical aggregates in the way PROC MEANS does by adding suffixes to the variable names in order to identify those aggregates... one could perhaps specify: "Depending on the statistical aggregates requested, the input variable name may have a maximum of 25 characters... This is the worst scenario and only happens when the requested statistic are <such and such>. In the best scenario, the maximum number of characters is 30 (e.g. when the statistic requested is N)".

 

I find this:

- Complicated to specify; and the specification may become obsolete in the future, for instance if SAS adds a new statistical aggregate whose name is longer than 6 characters (currently I believe the longer statistics name is 6-char long).

- May induce a user who normally uses SAS (and therefore who creates variables whose names are as long as 32 characters) to decide not to use the process at all because that would require him to change many variable names in order to use that piece of code.

And nowadays systems and data processes do work with many variables... in the hundreds or thousands.

 

My point of view

With increasingly complex projects where many different pieces of information are dealt with possibly coming from different sources, I find it very important to name objects as clearly as possible (for instance part of the variable name may include precisely the source where the piece of information is coming from), even by compromising short variable names.

Again I would use solutions (1) and (2) above to work around longer names.

 

 

ChrisHemedinger
Community Manager

Hi All - I'm glad to see the robust debate about the merits and pitfalls of long variable names.  From what I've seen, the top argument in support (and it's a strong one) is for compatibility and exchange with other systems that already support very long names.

 

We talk about SAS variables in two ways:

 - as an alias for column name, each field in an observation has a name.  This is the statistical or data management sense of the word.  We don't always have control over these names, especially when they come from other sources.

 - as a storage location for a value in your programming logic.  This is something that a SAS programmer has more control over, and selecting meaningful but concise names is considered a good practice.

 

Here at SAS, I don't think there is any question that the 32-character limit is a pain point and something that many of us would like to see addressed.  It's a limit that's buried deep in the fabric of SAS though, so such a change has high impact.  I'll see if we can come up with some information that helps, including a roadmap of how this might be handled in the future.

Quentin
Super User

Thanks as always @ChrisHemedinger.  I don't have a strong argument as to whether or not to expand the length of variable names.  But it if is decided to expand it, I definitely need a system option to control it, ValidVarLength=32|64|128|whatever.  

 

Because we all have way too much code that expects variable names to have only 32 characters.  I would need to keep that limitation enforced by default, even if SAS decides to support ReallyReallyReallyReallyReallyReallyReallyReallyReallyReallyReallyReallyReallyReallyReallyLongVariableNames.

 

Tom
Super User
Super User

To me the most pressing need is a method to access tables/variables from external data sources that have long names. Currently the only way to make a live connection to such data is to have the DBA on the remote system create a view that renames objects to names that are 32 characters or shorter. If SAS could just release options/utilities to deal with those issues for users that do not have write access to the remote system it would be a big win. 

 

One option might be some type of dataset or library engine option that would allow SAS user to map the names.  DSNOption should be workable for variable names.

set oralib.tablename(dbsasname=(loooooooooongname=shortname));

Another option might be some type of utility for generating views via passthru queries.  So perhaps you could pass in the name of a remote database schema and a SAS libref and the utility would create views in the SAS libref that allow you to access the tables/views in the remote database.  Allowing control of variable type conversion/casting via the same utility would also be helpful.