SAS Programming

dustychair · Posted 11-09-2018 11:24 PM

Hi,

This is the first time that I'm writing an array code by myself. Good news, it worked (YAY!). Bad news, it calculated only the first row. However I have 20 rows. Could you help to find what I'm missing. Also, I have only three variables named theta here, so it is easy to write ex1_1 ex1_2 ex1_3 ex1_4 ex2_1 ex2_2 ex2_3 ex2_4 ex3_1 ex3_2 ex3_3 ex3_4 but when I have 500 thetas, is there an easy way to create variables as ex. The input files I used are attached and the code I used is below.

Many thanks

data par;
infile 'C:\cluster_new\mlg1.txt';
input a1 a2 a3 b1 b2 b3 ;
run;
data score;
infile 'C:\cluster_new\mlgs.txt';
input theta1 theta2 theta3;
run;
data all_pars;
merge par score;
run;
data all_pars;
set all_pars;
s1=-(a1+a2+a3)/4;
s2=s1+a1;
s3=s1+a2;
s4=s1+a3;
in1=-(b1+b2+b3)/4;
in2=in1+b1;
in3=in1+b2;
in4=in1+b3;
run;
data all_pars1;
set all_pars;
array t {*} theta1-theta3;
array ex {3,4} ex1_1 ex1_2 ex1_3 ex1_4 ex2_1 ex2_2 ex2_3 ex2_4 ex3_1 ex3_2 ex3_3 ex3_4;
array s {*} s1-s4;
array in {*} in1-in4;
do i=1 to 3;
do j=1 to 4;
ex(i,j)=exp(t(i)*s(j)+in(j));
end;
end;
run;

FreelanceReinh · Posted 11-10-2018 05:21 PM

Hi @dustychair,

Your mistake is in the (one-to-one) MERGE step: The one-observation dataset SCORE contributes only missing values to observations no. 2, 3, etc. in this type of merge.

I would correct it to:

data all_pars;
if _n_=1 then set score;
set par;
run;

This reads the single observation from dataset SCORE only in the first iteration of the DATA step ("if _n_=1") and doesn't touch these variables afterwards. Since all variables from a SET statement are automatically RETAINed, the theta values are copied to all subsequent observations, as desired.

Regarding the (hypothetical) variable list ex1_1 ex1_2 ... ex500_4 (consisting of 2000 items):

You can define an array without specifying the individual variable names. For example, your definition
```
array s {*} s1-s4;
```
is equivalent to
```
array s{4};
```
because s1, s2, s3, s4 are the default variable names for this array.

In the case of two- or higher-dimensional arrays the default names use sequential numbers (as for one-dimensional arrays) in row-major order (see documentation). So, if you really need the dimension-specific indices (i, j, ...) in the variable names rather than only in the array references (such as ex{i,j}), you still need to specify the list of names.

It's not difficult to create the long list mentioned above programmatically:

data _null_;
length c $16000; /* 500*4*(up to 8) characters: " ex123_4" */
do i=1 to 500;
  do j=1 to 4;
    c=catx(' ',c,cats('ex',i,'_',j));
  end;
end;
call symputx('vlist',c);
run;

The list is now available in macro variable VLIST and could be referenced in an ARRAY statement:

array ex{500,4} &vlist;

However, depending on the purpose, a dataset with 2000+ variables might be unwieldy and it could make more sense to aim at a vertical (long) dataset structure.

View solution in original post

Reeza · Posted 11-10-2018 12:03 AM

Arrays run on all rows by default. Check your source data.

FreelanceReinh · Posted 11-10-2018 05:21 PM

Hi @dustychair,

Your mistake is in the (one-to-one) MERGE step: The one-observation dataset SCORE contributes only missing values to observations no. 2, 3, etc. in this type of merge.

I would correct it to:

data all_pars;
if _n_=1 then set score;
set par;
run;

This reads the single observation from dataset SCORE only in the first iteration of the DATA step ("if _n_=1") and doesn't touch these variables afterwards. Since all variables from a SET statement are automatically RETAINed, the theta values are copied to all subsequent observations, as desired.

Regarding the (hypothetical) variable list ex1_1 ex1_2 ... ex500_4 (consisting of 2000 items):

You can define an array without specifying the individual variable names. For example, your definition
```
array s {*} s1-s4;
```
is equivalent to
```
array s{4};
```
because s1, s2, s3, s4 are the default variable names for this array.

In the case of two- or higher-dimensional arrays the default names use sequential numbers (as for one-dimensional arrays) in row-major order (see documentation). So, if you really need the dimension-specific indices (i, j, ...) in the variable names rather than only in the array references (such as ex{i,j}), you still need to specify the list of names.

It's not difficult to create the long list mentioned above programmatically:

data _null_;
length c $16000; /* 500*4*(up to 8) characters: " ex123_4" */
do i=1 to 500;
  do j=1 to 4;
    c=catx(' ',c,cats('ex',i,'_',j));
  end;
end;
call symputx('vlist',c);
run;

The list is now available in macro variable VLIST and could be referenced in an ARRAY statement:

array ex{500,4} &vlist;

However, depending on the purpose, a dataset with 2000+ variables might be unwieldy and it could make more sense to aim at a vertical (long) dataset structure.

dustychair · Posted 11-11-2018 12:38 AM

@FreelanceReinhard, you are awesome! Thank you for being patient with my simple questions and thank you for teaching me. I appreciate you!
Best,

SAS Programming

arrays for all rows

Re: arrays for all rows

Re: arrays for all rows

Re: arrays for all rows

Re: arrays for all rows

Enhancing Rowing Performance Through Data: Insights from Okeanos' SAS ...

Array from first row of a group to be a fraction of the last row of th...

An Introduction to SAS® Arrays

[SAS 활용 노하우] Array 사용법

ERROR: Array subscript out of range

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...