Hi everyone,
I keep getting errors when I run the following program. Dataset two has 5 variables, year, var1-var4 and 5 observations. The dataset one that I am trying to create should have 20 observations with variables year, gender, color and team. I want to split var1-var4 for for the two genders and 2 colors and do the same for each of the five years(2016-2012). But everytime i run this program i get an error that the array subscript is out of range. It feel like I am making a logic mistake somewhere but am not able to figure out where.
data one;
set two;
array tm (*) $ var1 var2 var3 var4;
do year=2016 to 2012 by -1;
do gender="Male","Female";
do color="Red","Green";
team=tm(year);
output;
end;
end;
end;
run;
What I want:
dataset one:
year gender color team
2016 male red var1 value
2016 male green var2 value
2016 female red var3 value
2016 female green var4 value
and so on.
Any input is much appreciated. Thanks!
Here you go:
data two; input year var1 $ var2 $ var3 $ var4 $; cards; 2016 john jack sarah susan 2015 ben bill britney anne 2014 chris peter monica robin 2013 jeff matt christie christine 2012 mike david dia diane ; run; data want (keep=year gender color team); set two; length gender color $20; array var{4}; i=1; do gender="male","female"; do color="red","green"; team=var{i}; output; i=i+1; end; end; run;
This:
array tm (*) $ var1 var2 var3 var4;
do year=2016 to 2012 by -1;
There are five iterations: 2016, 2015, 2014, 2013, 2012, however there are only four elements var1-var4. Hence you get an out of range.
If you can post some sample test data in the form of a datastep, then I can supply some appropriate code to process it. Just from a guess, I would say that having a counter within the year do loop which goes from 1 to 4 would work.
Thanks RW9. So my dataset two looks like this:
data two;
input year var1 $ var2 $ var3 $ var4 $;
cards;
2016 john jack sarah susan
2015 ben bill britney anne
2014 chris peter monica robin
2013 jeff matt christie christine
2012 mike david dia diane
;
run;
dataset one should look like this:
year gender color team
2016 male red John
2016 male green Jack
2016 female red Sarah
2016 female green Susan
Please let me know if I should provide more information. Thanks again
Here you go:
data two; input year var1 $ var2 $ var3 $ var4 $; cards; 2016 john jack sarah susan 2015 ben bill britney anne 2014 chris peter monica robin 2013 jeff matt christie christine 2012 mike david dia diane ; run; data want (keep=year gender color team); set two; length gender color $20; array var{4}; i=1; do gender="male","female"; do color="red","green"; team=var{i}; output; i=i+1; end; end; run;
Thank you so much RW9. But if I could bother you just a little bit more, I don't understand how the same year is repeated 4 times. This is exactly what I want but I am not clear how that is happening. Also, shouldn't I have to specify that the array is character array? I apologize if I am asking too many questions or if my questions are too basic.
No probs, for your two questions:
The array statement is a reference to variables in the dataset, if they do not exist then SAS needs to create them. In this instance however the variables var1-4 already exist and have their properties, so we only need the reference to them. If they did not exists then you would need to supply values or properties.
The year is not populated 4 times, what actually happens is that the data is written out 4 times due to the output in the two do loops. As we do not change the year value, it is the same at each output statement call - only at the next loop round the data set (i.e. when a set is encountered) does year change.
Thank you so much for taking the time to explain. I really appreciate all your help.
Why not just read it in that way?
data want ;
input year @ ;
length gender color name $20;
do gender="male","female";
do color="red","green";
input name @ ;
output;
end;
end;
cards;
2016 john jack sarah susan
2015 ben bill britney anne
2014 chris peter monica robin
2013 jeff matt christie christine
2012 mike david dia diane
;
Besides the number of elements in the array, you have to consider how you will refer to them. You are using this reference:
tm(year)
When YEAR is 2016, that refers to the 2016th element of the array ... a far cry from your intention.
The easiest syntax for the array definition ... one that would let you refer to tm(year) ... would be:
array tm {2012:2016} $ var1-var5;
Now SAS will expect numbers in the range of 2012 to 2016, to refer to the five elements of the array.
Thanks Astounding. I can see that I need to be more careful when writing the programs.
The reason why you get the message is obvious: the array TM is indexed 1 to 4, not 2012 to 2016. So an array entry like tm(2016) does not exist, and you get the error message.
Apart from that, you have another problem: your array has 4 elements, but you are trying to loop through 5 elements (indexes 2012, 2013, 2014 2015 and 2016).
So you need to remove a year from your loop, or come up with another variable for the last (or first?) year. Apart from that, the simple (but not much used) solution is to use another index for your array, e.g.:
data one;
set two;
array tm(2012:2016) var1-var5;
Now you can fetch an array element like tm(2013).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.