BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cheng14
Fluorite | Level 6

I have a series of variables with the following format.

 

x_YYUU

 

where YY = two digit year code, with range from 00-13 (padded leading 0 for single digits)

UU = two digit week code, with range 01-52 (padded leading 0 for single digits)

 

I have been able to write this with a manual looping through the YY's 00-13 and then just using an array to go through the UU's (weeks) of each year. Then I thought I could rewrite this as a double array to simplify all the code because there was all these manual loops going through each year 00-13.

 

However, I can't seem to get the double array to work:

 

currentyear is an observation's current year

currentweek is an observation's current week

 

there are values present for all observations of these variables above 

 

DATA outputset;

set inputset;

array x{14,52} x_0001-x_0052 x_0101-x_0152 x_0201-x_0252 x_0301-x_0352 x_0401-x_0452 x_0501-x_0552 x_0601-x_0652 x_0701-x_0752 x_0801-x_0852 x_0901-x_0952 x_1001-x_1052 x_1101-x_1152 x_1201-x_1252 x_1301-x_1352;

 

do i=currentyear to 13;

do j=currentweek to 52;

if missing(x{i,j}) then do;

theyear=i;

theweek=j;

leave;

end;

 

end;

end;

RUN;

 

I get an error on the x{i,j} part stating that the array is out of range.

 

Ive tried to test different parts of the array within this x_YYUU format and I can't figure it out. For example, I thought the 00's were maybe causing a problem, so I used the double array for years starting at 01 instead of 00, still did not work.

 

Is the naming of the variables in such a way that array processing is not possible? It seems wrong that I can do the week as array but not a double array with year.

 

Any guidance would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

You also mention that you have 00-13 for year.  Are those the values for CURRENTYEAR?  That would cause the error you are seeing.  The easiest fix would be to redefine the dimensions of the array:

 

array x {0:13, 52} ...

 

 

View solution in original post

15 REPLIES 15
Loko
Barite | Level 11

Hello,

 

you defined an array with 14 rows and 52 columns which translates to 14*52 variables.

 

nevertheless you have specified way fewer variables within the array - that's the reason for the error you get.

cheng14
Fluorite | Level 6

Thank you for the fast response.

 

Can you show me how I can specify the variable range properly then? I specified each year and the range of 01 to 52 for week, is that not equal to the 14*52 variables?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, why?  If you look at the many posts daily we get here, we always advise to use normalised data.  It avoids all this unnecessary coding problems.  Let me explain again.  A dataset is comprised of two parts.  The first part is a table structure - this should be fixed and easy to program with, so short variable names for instance.  The second part is data, this is the contents of the dataset and is held in observations - this containing the data items to work with.  What you have done is started by creating a structure of data elements, which means the structure will change, and will be difficult to work with.  Don't fight the system, work with it.  A simple change to it will result in something like:

...     YEAR     WEEK   VALUE

...     2013       1           xyz

...     2013       2           yzr

...

 

Then if your data suddenly has years 1900-2013 added, your code doesn't fall over, it just has more observations to process.  You will find all processing further on is far easier as well, for instance if you want a proc means on year 2014, how do you do it with your structure - well you would have to create a list of variables, then include them in the means probably using a macro variable, whereas with the above structure you can simply add a where clause.

cheng14
Fluorite | Level 6
Thanks for the reply RW9.

I did not code the variable names, this is just the standard that was given which I must work with.

Given that this variable formatting is what I receive it as, is there a limitations with the array or is my formatting of the range just wrong?
cheng14
Fluorite | Level 6
I appreciate the sentiment Kurt and I agree, but this is not an option. For what it's worth, there are no 53 weeks for this format, it consistently has just 00-13 for year and 01-52 for week.

I'm more interested in getting the array formatting to understand this bizarre "standard" so it can perform the loops. Is this simply not possible or?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then there are two things.  Firstly, who provided you that type of dataset, and why was an import agreement not setup where you - as the consumer of the data - decide what you needed?  This is something I would always do with third party data, I mean if they change the structure next time they send it to you then you are back at step 1 and have to reprogram.

 

Secondly, when programming with data, unless there is some regulation (i.e. for a regulation data needs to end up in a certain format), then there is no rule that states you cannot change the structure of your data to make it easy for you to program with.  Yes, there are no real limitations with using arrays, and given some time we could probably fix the code, however what you will find is anything you come up with will not be robust, documentable, or easy to maintain.  

cheng14
Fluorite | Level 6

Hi RW,

 

I'm just a low level data analyst. The numbers and formating are that from the national statistics office, and as multiple people are collaborating, I can't change the format.

 

Is this task possible without renaming the variables from x_YYUU format?

 

I am more doing this for intellectual curiosity, as the "task" is already completed when I treated the years in manual loop and used arrays for the weeks (worked fine).

 

I understand your points and Kurt's point about the format of the var being wrong and the consequences, but I am quite eager to satisfy the curiosity of how the array could work within these constraints.

 

I appreciate your time and expertise in addressing these issues.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, your array needs to have the number of elements given by the numbers in brackets.  Look at this example:

data have;
  x_0001=1; x_0002=5; x_0003=12; x_0004=.; x_0005=1; 
  x_0101=1; x_0102=5; x_0103=12; x_0104=.; x_0105=1; 
run;

data want;
  set have;
  array x_{2,5} x_:;
  do year=1 to 2;
    do week=1 to 5;
      if missing(x_{year,week}) then leave;
    end;
    if missing(x_{year,week}) then leave;
  end;
run;

So I create one observation with 10 columns, the first 5 year 1, the second year 2.  I can then reference them by a 2 * 5 array, so {1,x} refers to 0001-0005, and {2,x} refers to 0101-0105.  In your first example you dont haveenough variables on the array statement to fill the elements you are referencing in your code.

cheng14
Fluorite | Level 6
@RW

Thanks, I thought by referencing them with dash between them that it could simply fill in the values for me.

The problem is that the range of the x_YYUU variables is beyond just x_0001 to x_1352

Thus when I try to set up the array it says there are now TOO MANY variables specified.

I don't suppose there's a way to narrow the range definition on the array, in a similar fashion which I tried in the OP, but which actually works?

One way around this would be if I just dropped all the vars outside of x_0001 to x_1352.

Or perhaps I can define the array in much broader terms and then just loop through the ranges of interest?
Astounding
PROC Star

A couple of comments ...

 

First, I would assume that you actually have all the variables to work with, even if the array doesn't define them all.  It would be easy enough to add 14 dashes, to define all the elements that you want to work with:

 

array x{13,52} x_0001-x_0052 x_0101-x_0152 x_0201-x_0252 x_0301-x_0352 x_0401-x_0452 x_0501-x_0552 x_0601-x_0652 

x_0701-x_0752 x_0801-x_0852 x_0901-x_0952 x_1001-x_1052 x_1101-x_1152 x_1201-x_1252 x_1301-x_1352;

 

Second, why the "leave" statement?  Wouldn't you want to continue to check the remaining elements and replace missing values for additional elements?

cheng14
Fluorite | Level 6
hi Astounding.

This is actually exactly what I did early on, and it did not work. In OP i transcribed it incorreectly, meant to put dashes between the ranges (it's edited now to reflect this).

The "leave" statement exists because I only need to find the first missing value and then move on to the next observation
Astounding
PROC Star

You also mention that you have 00-13 for year.  Are those the values for CURRENTYEAR?  That would cause the error you are seeing.  The easiest fix would be to redefine the dimensions of the array:

 

array x {0:13, 52} ...

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1425 views
  • 7 likes
  • 5 in conversation