Hello,
In my last post users suggested using arrays. I have looked into arrays and have come across a problem which I think would best be solved using arrays.
I had a data set that was in long format. i.e., multiple lines per ID.
looked like this:
data x;
input ID c0 c1 c2 c3 c4 c5 c6 c7 ;
cards;
1 1 5 7 6 3 3 0 2
1 3 2 5 4 3 1 4 3
2 2 1 4 5 7 2 4 8
2 3 1 3 4 5 2 6 7
2 4 1 2 7 3 6 4 4
;
run;
I transposed it to wide format. So now it's like:
call this data set 'wide'
ID c0_1 c0_2 c0_3 c1_1 c1_2 c1_3 ....... c7_1 c7_2 c7_3
1 1 3 . 5 2 . 2 3 .
2 2 3 4 1 1 1 8 7 4
I want to take a sum for each line of record in the long format above (i.e., sum(c0,c1,c2,c3,c4,c5,c6,c7)) for the first line for ID1, then do it again for the second line for ID1. Then find that sum for the first , second and third lines of ID2.
Right now though, I have the data in wide format. How would I find this sum? I guess I could do it while in long format before I transpose it to wide format, but I want to know how to do this using an array.
So I want to find the sum of c0_1, c1_1, c2_1, .... , c7_1.
Then find the sum of c0_2, c1_2, c2_2, ......., c7_2
Here I only have 3 times I'd have to do it, but suppose I had 15 C variables? I can't manually do this by hand it'd be by an array..
This is what I started to write, but I find that there is error in my code. Any suggestions how to fix this? Simplest code possible?
Thanks!
data array1;
set wide;
c1sum=0;
array c(15)_1 c1_1-c15_1;
do i = 1 to 15;
if c(i)_1 then c1sum=sum(c1sum, c(i)_1);
end;
drop i ;
run;
SAS doesn't like the "c(i)_1" syntax.. it doesn't like the i not at the end of the variable name, I think.
I don't think you need arrays to do what you want. Wouldn't the following do it?:
data want;
set wide;
c0sum=sum(of c0:);
c1sum=sum(of c1:);
run;
Art,
I think the OP wants c1sum=sum(of c0_1,c1_1, ...,c15_1)
Linlin is probably correct. You will have to carefully check the following, but I think that it does what you want and, yes, it uses arrays:
data wide;
input id c0_1-c0_3 c1_1-c1_3
c2_1-c2_3 c3_1-c3_3
c4_1-c4_3 c5_1-c5_3
c6_1-c6_3 c7_1-c7_3;
cards;
1 1 3 . 5 2 . 4 4 4 5 5 5 6 5 6 7 7 7 4 2 4 3 2 3
2 2 3 4 1 1 1 3 2 3 6 5 6 7 8 7 2 2 2 4 2 1 1 2 3
;
%let numvars=3;
%let maxgroup=7;
data want (drop=_: i j counter);
set wide;
array everything(*) c0_1--c&maxgroup._&numvars.;
array _mean_stuff(%eval(&maxgroup.+1));
array means(&numvars.);
do i=1 to &numvars.;
counter=0;
do j=1 to dim(everything);
if input(substr(vname(everything(j)),index(vname(everything(j)),"_")+1),8.)
eq i then do;
counter+1;
_mean_stuff(counter)=everything(j);
end;
end;
means(i)=mean(of _mean_stuff(*));
end;
run;
Thank you! So I'm new to arrays and have only done simple ones. I don't quite understand what's going on in your code starting with "array everything(*) c0_1--c&maxgroup._&numvars.;" Any chance you could explain what's going on? The tutorials I've found online deal with simpler arrays.
For example,
in
everything(*) c0_1--c&maxgroup._&numvars.;
Would you have been able to just put in 12 and 15 instead of c&maxgroup._&numvars. ?
What does " array _mean_stuff(%eval(&maxgroup.+1));" mean?
I did run the code, but because I have M and N in a few entries, SAS will not go through with the array since I have mix of character and numeric variables *sigh*
Thanks,
Gina
First, I agree with my colleagues that transposing the data, in this case, probably complicated matters.
With your indicating that some of the values contain the letters M or N, it will help if you post a small sample dataset so that we can all see what you are having to deal with.
But, to answer your questions, declaring macro variables (with %let statements at the beginning of the code .. before the datastep), allows those variables to be used simply as text substitution so that only the two %let statements have to be changed when the values change.
Given:
%let numvars=3;
%let maxgroup=7;
data want (drop=_: i j counter);
set wide;
array everything(*) c0_1--c&maxgroup._&numvars.;
the array statement really reads:
array everything(*) c0_1--c&7_3.;
That way, only the %let statements have to be changed if your parameters change (like to 12 or 15).
From your comments it sounds like your variables are character variables. Again, not a problem, as arrays can be declared as being character. e.g.:
array everything(*) $ c0_1--c&maxgroup._&numvars.;
and, possibly, only an input function would have to be added to one line, namely:
_mean_stuff(counter)=input(everything(j), ?? 12.);
Thus, you may be able to use something like:
data wide;
input id (c0_1-c0_3 c1_1-c1_3
c2_1-c2_3 c3_1-c3_3
c4_1-c4_3 c5_1-c5_3
c6_1-c6_3 c7_1-c7_3) (: $1.);
cards;
1 1 3 M 5 2 M 4 4 4 5 5 5 6 5 6 7 7 7 4 2 4 3 2 3
2 2 3 4 1 1 1 3 2 3 6 5 6 7 N 7 2 2 2 4 2 1 1 2 3
;
%let numvars=3;
%let maxgroup=7;
data want (drop=_: i j counter);
set wide;
array everything(*) $ c0_1--c&maxgroup._&numvars.;
array _mean_stuff(%eval(&maxgroup.+1));
array means(&numvars.);
do i=1 to &numvars.;
counter=0;
do j=1 to dim(everything);
if input(substr(vname(everything(j)),index(vname(everything(j)),"_")+1),8.)
eq i then do;
counter+1;
_mean_stuff(counter)=input(everything(j),?? 12.);
end;
end;
means(i)=mean(of _mean_stuff(*));
end;
run;
But, if not, it would help to see your pre and post transposed data.
I'm still not sure if I understand what your data look like, but I've been experimenting with using functions with multi-dimensional arrays based on another thread from a fews days ago.
Hopefully, the following approximates your data and is a little easier to understand:
data wide;
input id (c0_1-c0_3 c1_1-c1_3
c2_1-c2_3 c3_1-c3_3
c4_1-c4_3 c5_1-c5_3
c6_1-c6_3 c7_1-c7_3
c8_1-c8_3 c9_1-c9_3
c10_1-c10_3 c11_1-c11_3
c12_1-c12_3 c13_1-c13_3
c14_1-c14_3 c15_1-c15_3) ($1.);
cards;
1 123123123123123123123123123123123123123123123123
2 234234234234234234234234234234234234234234234234
;
%let numvars=3;
%let numgroups=15;
data want;
set wide;
array allchars (%eval(&numgroups.+1),&numvars.) c0_1--c&numgroups._&numvars.;
array sums(&numvars.);
call missing(of sums(*));
do i=1 to %eval(&numgroups.+1);
do j=1 to &numvars.;
sums(j)+input(allchars(i,j),?? 12.);
end;
end;
run;
You don't need to traspose it . Using a simple statement is enough. What is your final result ?
data x;
input ID c0 c1 c2 c3 c4 c5 c6 c7 ;
sum=sum(of c: );
cards;
1 1 5 7 6 3 3 0 2
1 3 2 5 4 3 1 4 3
2 2 1 4 5 7 2 4 8
2 3 1 3 4 5 2 6 7
2 4 1 2 7 3 6 4 4
;
run;
Ksharp
Why not just sum the values for each observation first.
data want;
set X;
sum=sum(of c0-c7);
run;
Why did you roll the multiple observations up into a "WIDE" format? What does this format do for you?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.