BookmarkSubscribeRSS Feed
ginak
Quartz | Level 8

Hello,

In my last post users suggested using arrays. I have looked into arrays and have come across a problem which I think would best be solved using arrays.

I had a data set that was in long format. i.e., multiple lines per ID.

looked like this:

data x;

input ID  c0  c1  c2  c3  c4  c5  c6  c7 ;

cards;

1    1          5         7        6           3         3       0         2 

1    3          2         5        4           3         1       4         3 

2    2          1         4        5           7         2       4         8 

2    3          1         3        4           5         2       6         7 

2    4          1         2        7           3         6       4         4 

;

run;

I transposed it to wide format. So now it's like:

call this data set 'wide'

ID    c0_1 c0_2  c0_3 c1_1 c1_2 c1_3 ....... c7_1 c7_2 c7_3

1      1        3         .       5    2        .             2       3        .       

2       2       3         4       1   1        1             8       7       4

I want to take a sum for each line of record in the long format above (i.e., sum(c0,c1,c2,c3,c4,c5,c6,c7)) for the first line for ID1, then do it again for the second line for ID1. Then find that sum for the first , second and third lines of ID2.

Right now though, I have the data in wide format. How would I find this sum? I guess I could do it while in long format before I transpose it to wide format, but I want to know how to do this using an array.

So I want to find the sum of c0_1, c1_1, c2_1,  .... , c7_1.

Then find the sum of c0_2, c1_2, c2_2, ......., c7_2

Here I only have 3 times I'd have to do it, but suppose I had 15 C variables? I can't manually do this by hand it'd be by an array..

This is what I started to write, but I find that there is error in my code. Any suggestions how to fix this? Simplest code possible? Smiley Happy

Thanks!

data array1;

       set wide;

       c1sum=0;

       array c(15)_1 c1_1-c15_1;

       do i = 1 to 15;

            if c(i)_1 then c1sum=sum(c1sum, c(i)_1);

            end;

            drop i ;

run;

SAS doesn't like the "c(i)_1" syntax.. it doesn't like the i not at the end of the variable name, I think. Smiley Sad

8 REPLIES 8
art297
Opal | Level 21

I don't think you need arrays to do what you want.  Wouldn't the following do it?:

data want;

  set wide;

  c0sum=sum(of c0:);

  c1sum=sum(of c1:);

run;

Linlin
Lapis Lazuli | Level 10

Art,

I think the OP wants c1sum=sum(of c0_1,c1_1, ...,c15_1)

art297
Opal | Level 21

Linlin is probably correct.  You will have to carefully check the following, but I think that it does what you want and, yes, it uses arrays:

data wide;

  input id c0_1-c0_3 c1_1-c1_3

           c2_1-c2_3 c3_1-c3_3

           c4_1-c4_3 c5_1-c5_3

           c6_1-c6_3 c7_1-c7_3;

  cards;

1 1 3 . 5 2 . 4 4 4 5 5 5 6 5 6 7 7 7 4 2 4 3 2 3

2 2 3 4 1 1 1 3 2 3 6 5 6 7 8 7 2 2 2 4 2 1 1 2 3

;

%let numvars=3;

%let maxgroup=7;

data want (drop=_: i j counter);

  set wide;

  array everything(*) c0_1--c&maxgroup._&numvars.;

  array _mean_stuff(%eval(&maxgroup.+1));

  array means(&numvars.);

  do i=1 to &numvars.;

    counter=0;

    do j=1 to dim(everything);

      if input(substr(vname(everything(j)),index(vname(everything(j)),"_")+1),8.)

       eq i then do;

        counter+1;

        _mean_stuff(counter)=everything(j);

      end;

    end;

    means(i)=mean(of _mean_stuff(*));

  end;

run;

ginak
Quartz | Level 8

Thank you! So I'm new to arrays and have only done simple ones. I don't quite understand what's going on in your code starting with "array everything(*) c0_1--c&maxgroup._&numvars.;" Smiley Sad Any chance you could explain what's going on? The tutorials I've found online deal with simpler arrays.

For example,

in

everything(*) c0_1--c&maxgroup._&numvars.;

Would you have been able to just put in 12 and 15 instead of c&maxgroup._&numvars. ?

What does " array _mean_stuff(%eval(&maxgroup.+1));" mean?

I did run the code, but because I have M and N in a few entries, SAS will not go through with the array since I have mix of character and numeric variables *sigh*


Thanks,
Gina

art297
Opal | Level 21

First, I agree with my colleagues that transposing the data, in this case, probably complicated matters.

With your indicating that some of the values contain the letters M or N, it will help if you post a small sample dataset so that we can all see what you are having to deal with.

But, to answer your questions, declaring macro variables (with %let statements at the beginning of the code .. before the datastep), allows those variables to be used simply as text substitution so that only the two %let statements have to be changed when the values change.

Given:

%let numvars=3;

%let maxgroup=7;

data want (drop=_: i j counter);

  set wide;

  array everything(*) c0_1--c&maxgroup._&numvars.;

the array statement really reads:

  array everything(*) c0_1--c&7_3.;

That way, only the %let statements have to be changed if your parameters change (like to 12 or 15).

From your comments it sounds like your variables are character variables.  Again, not a problem, as arrays can be declared as being character.  e.g.:

  array everything(*) $ c0_1--c&maxgroup._&numvars.;

and, possibly, only an input function would have to be added to one line, namely:

        _mean_stuff(counter)=input(everything(j), ?? 12.);

Thus, you may be able to use something like:

data wide;

  input id (c0_1-c0_3 c1_1-c1_3

           c2_1-c2_3 c3_1-c3_3

           c4_1-c4_3 c5_1-c5_3

           c6_1-c6_3 c7_1-c7_3) (: $1.);

  cards;

1 1 3 M 5 2 M 4 4 4 5 5 5 6 5 6 7 7 7 4 2 4 3 2 3

2 2 3 4 1 1 1 3 2 3 6 5 6 7 N 7 2 2 2 4 2 1 1 2 3

;

%let numvars=3;

%let maxgroup=7;

data want (drop=_: i j counter);

  set wide;

  array everything(*) $ c0_1--c&maxgroup._&numvars.;

  array _mean_stuff(%eval(&maxgroup.+1));

  array means(&numvars.);

  do i=1 to &numvars.;

    counter=0;

    do j=1 to dim(everything);

      if input(substr(vname(everything(j)),index(vname(everything(j)),"_")+1),8.)

       eq i then do;

        counter+1;

        _mean_stuff(counter)=input(everything(j),?? 12.);

      end;

    end;

    means(i)=mean(of _mean_stuff(*));

  end;

run;

But, if not, it would help to see your pre and post transposed data.

art297
Opal | Level 21

I'm still not sure if I understand what your data look like, but I've been experimenting with using functions with multi-dimensional arrays based on another thread from a fews days ago.

Hopefully, the following approximates your data and is a little easier to understand:

data wide;

  input id (c0_1-c0_3 c1_1-c1_3

           c2_1-c2_3 c3_1-c3_3

           c4_1-c4_3 c5_1-c5_3

           c6_1-c6_3 c7_1-c7_3

           c8_1-c8_3 c9_1-c9_3

           c10_1-c10_3 c11_1-c11_3

           c12_1-c12_3 c13_1-c13_3

           c14_1-c14_3 c15_1-c15_3) ($1.);

  cards;

1 123123123123123123123123123123123123123123123123

2 234234234234234234234234234234234234234234234234

;

%let numvars=3;

%let numgroups=15;

data want;

  set wide;

  array allchars (%eval(&numgroups.+1),&numvars.) c0_1--c&numgroups._&numvars.;

  array sums(&numvars.);

  call missing(of sums(*));

  do i=1 to %eval(&numgroups.+1);

    do j=1 to &numvars.;

      sums(j)+input(allchars(i,j),?? 12.);

    end;

  end;

run;

Ksharp
Super User

You don't need to traspose it . Using a simple statement is enough. What is your final result ?

data x;

input ID  c0  c1  c2  c3  c4  c5  c6  c7 ;

sum=sum(of c: );

cards;

1    1          5         7        6           3         3       0         2 

1    3          2         5        4           3         1       4         3 

2    2          1         4        5           7         2       4         8 

2    3          1         3        4           5         2       6         7 

2    4          1         2        7           3         6       4         4 

;

run;

Ksharp

Tom
Super User Tom
Super User

Why not just sum the values for each observation first.

data want;

  set X;

  sum=sum(of c0-c7);

run;


Why did you roll the multiple observations up into a "WIDE" format? What does this format do for you?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 897 views
  • 1 like
  • 5 in conversation