Solved: Array declaration problems when using both dimensions and var names

al_king · Posted 05-19-2022 10:02 PM

Hi there,

First time poster - I always find results from this forum useful when troubleshooting SAS issues, so though it was worth recording this issue that I encountered.

I'm using SAS 9.4. Most of the time, when declaring an array in a DATA step with named variables, I'll use the asterisk, to tell SAS to work out how long to make the array, e.g.:

ARRAY parsed_vars(*) arr_var1-arr_var10 _character_;

However, sometimes you want to have named variables, and also an explicit range: this is particularly useful when you want your array indices to start at an unusual place, because it's easier to understand.
So I tried:

ARRAY parsed_vars(0:10) arr_var0-arr_var10 _character_;

but it wouldn't work, with an unusual error:

ERROR: Too many variables defined for the dimension(s) specified for the array parsed_vars.

If I tried e.g. changing the upper-bound 10 to a higher number, (0:20) say, I might get:

ERROR: Too few variables defined for the dimension(s) specified for the array parsed_vars.

But it wouldn't accept any specific range: even a conventional arr(1:10) var1-var10 would yield a variation on the same error, 'too many' or 'too few', for any upper and lower index I chose.

What was the trick? Declare it slightly differently:

ARRAY parsed_vars(0:10) $50 arr_var0-arr_var10;

The _character_ suffix normally looks for existing variables to use as part of your array, but when the range is unspecified like (*), it doesn't complain: for me it automatically allocated 900-wide character variables, i.e. it interpreted it as an implicit declaration of new variables. For whatever reason, once I introduce an explicit range, it stops using this implicit declaration behaviour: I have to specify my field width for it to work (in this case, $50 for a 50-wide character, for example).

It's worth mentioning - the explicit range can be really useful for code readability: in my case, the data it's holding corresponds with field N of some standard, and so it's useful for the var name and array index to use the same N as the corresponding documentation. So parsed_vars(0) = arr_var0, and the contents of arr_var0 corresponds with the 0th field in the spec, etc.

There were some other lessons learned - when I tried to make a 'minimal reproducible example' for myself by making a small DATA step without any other complications, omitting an e.g. SET WORK.dataset1; statement made things "look OK" - SAS didn't have to try to instantiate the array when there were no rows of data and so didn't encounter the problem, i.e. it doesn't "statically check" this declaration (unlike other kinds of syntax errors), it only sees it's a problem when it tries to run it.

Tom · Posted 05-20-2022 08:39 AM

You don't seem to understand what _CHARACTER_ means in that context. It is a variable list. All of the character variables (at least the ones that the data step compiler knows about when it is compiling that statement.

If you want to define an array of 11 character variables each of which is of length 900 using a statement like:

array charvar[11] $900;

Which is the same as:

array charvar $900 charvar1-charvar11;

Or

length charvar1-charvar11 $900;
array charvar[11] ;

You can specify both a dimension and a list of variables. In that case the dimensions must match the number of variables listed. If you use a variable list specification like _CHARACTER_ or _NUMERIC_ or _ALL_ or even firstvar--lastvar without knowing how many variables that will evaluate to then you are better off letting SAS count the number of variables for you.

array chars _character_;
array nums _numeric_;
array middle address1 -- zipcode10 ;

PS There is no need to waste time typing (*) or [*] when you want SAS to count the dimension for you. SAS does not need it.

View solution in original post

mkeintz · Posted 05-20-2022 12:10 AM

The term _character_ is interpreted by the SAS compiler as the list of all known character variables at that point in the program. If there are no known variables prior to the point you use

ARRAY parsed_vars(*) arr_var1-arr_var10 _character_;

you will have declared a 10-element NUMERIC array, because numeric is the default variable type for the array statement. In this case there would be no error message, but you would not have succeeded in generating an array (of any size) of character variables.

And had you used

ARRAY parsed_vars(0:10) arr_var0-arr_var10 _character_;

with no prior character variables, you would have generated an 11-element numeric array - again with no error message. As you have discovered, if there was an error message for this statement, it means that a character variable had been referred to in some preceding code.

Those know variables would either be provided via a preceding SET (or MERGE) statement, by explicit declaration (e.g. the LENGTH statement) or assignment statements ( i.e. name='John').

There are also the analogous terms _numeric_ and _all_.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

al_king · Posted 05-21-2022 12:59 AM

Yep OK, that makes sense. I was working with someone's code that used a trailing _character_ to effectively make the newly declared variables that type, which I can see in a sense worked "by coincidence" - SAS was correctly inferring the type for my newly-declared variables because the trailing _character_ list happened to be non-empty.

Kurt_Bremser · Posted 05-20-2022 12:43 AM

To define the array as character, use the $ sign.

array parsed_vars(*) $ arr_var1-arr_var10;

Set a length immediately after the $ sign if you want something other than 8.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 05-20-2022 08:39 AM

You don't seem to understand what _CHARACTER_ means in that context. It is a variable list. All of the character variables (at least the ones that the data step compiler knows about when it is compiling that statement.

If you want to define an array of 11 character variables each of which is of length 900 using a statement like:

array charvar[11] $900;

Which is the same as:

array charvar $900 charvar1-charvar11;

Or

length charvar1-charvar11 $900;
array charvar[11] ;

You can specify both a dimension and a list of variables. In that case the dimensions must match the number of variables listed. If you use a variable list specification like _CHARACTER_ or _NUMERIC_ or _ALL_ or even firstvar--lastvar without knowing how many variables that will evaluate to then you are better off letting SAS count the number of variables for you.

array chars _character_;
array nums _numeric_;
array middle address1 -- zipcode10 ;

PS There is no need to waste time typing (*) or [*] when you want SAS to count the dimension for you. SAS does not need it.

al_king · Posted 05-21-2022 01:30 AM

In my case, I wanted my 11 variables to be indexed from zero, which I think is only possible with the (0:10) style dimensions, but otherwise that's quite clear. (Since inference had been working fine, I probably wouldn't have recognised the misunderstanding here until I needed to get specific with dimensions!)

That does seem to be the culprit: the previous developers had used _character_ as a tacit array type declaration (with no intention of accessing anything beyond the newly-declared variables in the array). The array documentation was just slightly too vague to disabuse me of that notion -

These SAS variable lists enable you to reference variables...

sounded like it was a flag to permit references to all predefined variables, rather than something actually constituting the reference. The "type declaration"-type usage happened to work, by coincidence, because once a known variable appeared in the back end of the list, SAS knew to enforce a consistent type across the array. As mkeintz has observed, if there were no predefined character variables the _character_ list would have been empty, and so my new variables would have ended up as NUMERIC.

Kurt_Bremser · Posted 05-21-2022 02:52 AM

It's not really that vague:

array-elements

specifies the names of the elements that make up the array. Array-elements must be either all numeric or all character, and they can be listed in any order. The elements can be named variables or temporary data elements.

variables

lists variable names.

Range	The names must be either variables that you define in the ARRAY statement or variables that SAS creates by concatenating the array name and a number. For example, when the subscript is a number (not the asterisk), you do not need to name each variable in the array. Instead, SAS creates variable names by concatenating the array name and the numbers 1, 2, 3, …n.
Restriction	If you use _ALL_, all the previously defined variables must be of the same type.
Tips	These SAS variable lists enable you to reference variables that have been previously defined in the same DATA step:
	_NUMERIC_ specifies all numeric variables.
	_CHARACTER_ specifies all character variables.
	_ALL_ specifies all variables.

The function of the _CHARACTER_ keyword is clearly explained here.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

al_king · Posted 05-21-2022 03:16 AM

Haha, I'll give you that it's understandable when you already know what it's trying to say, but I can tell you I was referring directly to this very documentation and it didn't clarify, so it was sufficiently vague.

To me, "specifying all character variables" doesn't invariably read as "this token is equivalent to all character variable names" - it just means that you're specifying something, and the thing being specified is all character variables. Specify in what sense? OK, so let's refer to the description of the category: they "enable you to reference variables that have been previously defined". My interpretation at the time was, _character_ is a flag that permits your array to refer to preexisting variables ('enabling' as in a literal functional toggle, rather than a language construct that makes it possible i.e. constitutes the reference) - specifically, all character ones. Right? I can see now the intended interpretation but the phrasing is not very concrete.

(I'm not really trying to rail too hard against the documentation here, but just making clear what I mean by vague. Anyway, I've certainly come to the right place for this kind of clarification, functional examples etc.)

Kurt_Bremser · Posted 05-21-2022 04:47 AM

I think it is important to know what "predefined" means in the context of compiling the data step code. Here, it does not mean "defined previously in the ARRAY statement", but "defined previously before the statement was encountered".

It needs familiarity with how the data step compiler builds the PDV to not be misled by this particular sentence. This familiarity grows out of experience, but there is also Maxim 41. Most experience arises from mistakes.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

array-elements

variables

Re: Array declaration problems when using both dimensions and var names

Re: Array declaration problems when using both dimensions and var names

array-elements

variables

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!