I have a series of variables with the following format.
x_YYUU
where YY = two digit year code, with range from 00-13 (padded leading 0 for single digits)
UU = two digit week code, with range 01-52 (padded leading 0 for single digits)
I have been able to write this with a manual looping through the YY's 00-13 and then just using an array to go through the UU's (weeks) of each year. Then I thought I could rewrite this as a double array to simplify all the code because there was all these manual loops going through each year 00-13.
However, I can't seem to get the double array to work:
currentyear is an observation's current year
currentweek is an observation's current week
there are values present for all observations of these variables above
DATA outputset;
set inputset;
array x{14,52} x_0001-x_0052 x_0101-x_0152 x_0201-x_0252 x_0301-x_0352 x_0401-x_0452 x_0501-x_0552 x_0601-x_0652 x_0701-x_0752 x_0801-x_0852 x_0901-x_0952 x_1001-x_1052 x_1101-x_1152 x_1201-x_1252 x_1301-x_1352;
do i=currentyear to 13;
do j=currentweek to 52;
if missing(x{i,j}) then do;
theyear=i;
theweek=j;
leave;
end;
end;
end;
RUN;
I get an error on the x{i,j} part stating that the array is out of range.
Ive tried to test different parts of the array within this x_YYUU format and I can't figure it out. For example, I thought the 00's were maybe causing a problem, so I used the double array for years starting at 01 instead of 00, still did not work.
Is the naming of the variables in such a way that array processing is not possible? It seems wrong that I can do the week as array but not a double array with year.
Any guidance would be appreciated.
You also mention that you have 00-13 for year. Are those the values for CURRENTYEAR? That would cause the error you are seeing. The easiest fix would be to redefine the dimensions of the array:
array x {0:13, 52} ...
Hello,
you defined an array with 14 rows and 52 columns which translates to 14*52 variables.
nevertheless you have specified way fewer variables within the array - that's the reason for the error you get.
Thank you for the fast response.
Can you show me how I can specify the variable range properly then? I specified each year and the range of 01 to 52 for week, is that not equal to the 14*52 variables?
You will never achieve correct results, as some years have 53 weeks. Use a long format, as @RW9 has already suggested.
Sorry, why? If you look at the many posts daily we get here, we always advise to use normalised data. It avoids all this unnecessary coding problems. Let me explain again. A dataset is comprised of two parts. The first part is a table structure - this should be fixed and easy to program with, so short variable names for instance. The second part is data, this is the contents of the dataset and is held in observations - this containing the data items to work with. What you have done is started by creating a structure of data elements, which means the structure will change, and will be difficult to work with. Don't fight the system, work with it. A simple change to it will result in something like:
... YEAR WEEK VALUE
... 2013 1 xyz
... 2013 2 yzr
...
Then if your data suddenly has years 1900-2013 added, your code doesn't fall over, it just has more observations to process. You will find all processing further on is far easier as well, for instance if you want a proc means on year 2014, how do you do it with your structure - well you would have to create a list of variables, then include them in the means probably using a macro variable, whereas with the above structure you can simply add a where clause.
@cheng14 wrote:
... this is just the standard that was given which I must work with.
Then the "standard" is stupid and has to be changed. Period.
Then there are two things. Firstly, who provided you that type of dataset, and why was an import agreement not setup where you - as the consumer of the data - decide what you needed? This is something I would always do with third party data, I mean if they change the structure next time they send it to you then you are back at step 1 and have to reprogram.
Secondly, when programming with data, unless there is some regulation (i.e. for a regulation data needs to end up in a certain format), then there is no rule that states you cannot change the structure of your data to make it easy for you to program with. Yes, there are no real limitations with using arrays, and given some time we could probably fix the code, however what you will find is anything you come up with will not be robust, documentable, or easy to maintain.
Hi RW,
I'm just a low level data analyst. The numbers and formating are that from the national statistics office, and as multiple people are collaborating, I can't change the format.
Is this task possible without renaming the variables from x_YYUU format?
I am more doing this for intellectual curiosity, as the "task" is already completed when I treated the years in manual loop and used arrays for the weeks (worked fine).
I understand your points and Kurt's point about the format of the var being wrong and the consequences, but I am quite eager to satisfy the curiosity of how the array could work within these constraints.
I appreciate your time and expertise in addressing these issues.
Well, your array needs to have the number of elements given by the numbers in brackets. Look at this example:
data have; x_0001=1; x_0002=5; x_0003=12; x_0004=.; x_0005=1; x_0101=1; x_0102=5; x_0103=12; x_0104=.; x_0105=1; run; data want; set have; array x_{2,5} x_:; do year=1 to 2; do week=1 to 5; if missing(x_{year,week}) then leave; end; if missing(x_{year,week}) then leave; end; run;
So I create one observation with 10 columns, the first 5 year 1, the second year 2. I can then reference them by a 2 * 5 array, so {1,x} refers to 0001-0005, and {2,x} refers to 0101-0105. In your first example you dont haveenough variables on the array statement to fill the elements you are referencing in your code.
A couple of comments ...
First, I would assume that you actually have all the variables to work with, even if the array doesn't define them all. It would be easy enough to add 14 dashes, to define all the elements that you want to work with:
array x{13,52} x_0001-x_0052 x_0101-x_0152 x_0201-x_0252 x_0301-x_0352 x_0401-x_0452 x_0501-x_0552 x_0601-x_0652
x_0701-x_0752 x_0801-x_0852 x_0901-x_0952 x_1001-x_1052 x_1101-x_1152 x_1201-x_1252 x_1301-x_1352;
Second, why the "leave" statement? Wouldn't you want to continue to check the remaining elements and replace missing values for additional elements?
You also mention that you have 00-13 for year. Are those the values for CURRENTYEAR? That would cause the error you are seeing. The easiest fix would be to redefine the dimensions of the array:
array x {0:13, 52} ...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.