- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
This is the first time that I'm writing an array code by myself. Good news, it worked (YAY!). Bad news, it calculated only the first row. However I have 20 rows. Could you help to find what I'm missing. Also, I have only three variables named theta here, so it is easy to write ex1_1 ex1_2 ex1_3 ex1_4 ex2_1 ex2_2 ex2_3 ex2_4 ex3_1 ex3_2 ex3_3 ex3_4 but when I have 500 thetas, is there an easy way to create variables as ex. The input files I used are attached and the code I used is below.
Many thanks
data par;
infile 'C:\cluster_new\mlg1.txt';
input a1 a2 a3 b1 b2 b3 ;
run;
data score;
infile 'C:\cluster_new\mlgs.txt';
input theta1 theta2 theta3;
run;
data all_pars;
merge par score;
run;
data all_pars;
set all_pars;
s1=-(a1+a2+a3)/4;
s2=s1+a1;
s3=s1+a2;
s4=s1+a3;
in1=-(b1+b2+b3)/4;
in2=in1+b1;
in3=in1+b2;
in4=in1+b3;
run;
data all_pars1;
set all_pars;
array t {*} theta1-theta3;
array ex {3,4} ex1_1 ex1_2 ex1_3 ex1_4 ex2_1 ex2_2 ex2_3 ex2_4 ex3_1 ex3_2 ex3_3 ex3_4;
array s {*} s1-s4;
array in {*} in1-in4;
do i=1 to 3;
do j=1 to 4;
ex(i,j)=exp(t(i)*s(j)+in(j));
end;
end;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @dustychair,
Your mistake is in the (one-to-one) MERGE step: The one-observation dataset SCORE contributes only missing values to observations no. 2, 3, etc. in this type of merge.
I would correct it to:
data all_pars;
if _n_=1 then set score;
set par;
run;
This reads the single observation from dataset SCORE only in the first iteration of the DATA step ("if _n_=1") and doesn't touch these variables afterwards. Since all variables from a SET statement are automatically RETAINed, the theta values are copied to all subsequent observations, as desired.
Regarding the (hypothetical) variable list ex1_1 ex1_2 ... ex500_4 (consisting of 2000 items):
- You can define an array without specifying the individual variable names. For example, your definition
is equivalent toarray s {*} s1-s4;
because s1, s2, s3, s4 are the default variable names for this array.array s{4};
In the case of two- or higher-dimensional arrays the default names use sequential numbers (as for one-dimensional arrays) in row-major order (see documentation). So, if you really need the dimension-specific indices (i, j, ...) in the variable names rather than only in the array references (such as ex{i,j}), you still need to specify the list of names. - It's not difficult to create the long list mentioned above programmatically:
The list is now available in macro variable VLIST and could be referenced in an ARRAY statement:data _null_; length c $16000; /* 500*4*(up to 8) characters: " ex123_4" */ do i=1 to 500; do j=1 to 4; c=catx(' ',c,cats('ex',i,'_',j)); end; end; call symputx('vlist',c); run;
array ex{500,4} &vlist;
- However, depending on the purpose, a dataset with 2000+ variables might be unwieldy and it could make more sense to aim at a vertical (long) dataset structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @dustychair,
Your mistake is in the (one-to-one) MERGE step: The one-observation dataset SCORE contributes only missing values to observations no. 2, 3, etc. in this type of merge.
I would correct it to:
data all_pars;
if _n_=1 then set score;
set par;
run;
This reads the single observation from dataset SCORE only in the first iteration of the DATA step ("if _n_=1") and doesn't touch these variables afterwards. Since all variables from a SET statement are automatically RETAINed, the theta values are copied to all subsequent observations, as desired.
Regarding the (hypothetical) variable list ex1_1 ex1_2 ... ex500_4 (consisting of 2000 items):
- You can define an array without specifying the individual variable names. For example, your definition
is equivalent toarray s {*} s1-s4;
because s1, s2, s3, s4 are the default variable names for this array.array s{4};
In the case of two- or higher-dimensional arrays the default names use sequential numbers (as for one-dimensional arrays) in row-major order (see documentation). So, if you really need the dimension-specific indices (i, j, ...) in the variable names rather than only in the array references (such as ex{i,j}), you still need to specify the list of names. - It's not difficult to create the long list mentioned above programmatically:
The list is now available in macro variable VLIST and could be referenced in an ARRAY statement:data _null_; length c $16000; /* 500*4*(up to 8) characters: " ex123_4" */ do i=1 to 500; do j=1 to 4; c=catx(' ',c,cats('ex',i,'_',j)); end; end; call symputx('vlist',c); run;
array ex{500,4} &vlist;
- However, depending on the purpose, a dataset with 2000+ variables might be unwieldy and it could make more sense to aim at a vertical (long) dataset structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Best,