Alternative of by statement in Proc IML

Rasheed · Posted 07-14-2015 12:58 AM

What is the alternative of by statement in Proc IML

Regards

Rasheed · Posted 07-14-2015 01:26 AM

I have following output in IML

boot t1CV t2CV

1	0.0036813	0.0030405
1	0.0030405	0.0032935
2	0.0015928	0.0017098
2	0.0017098	0.0020805

These are actually two square matrices one for boot 1 and second for boot 2

Now I want to use each square matrix separately in my further analysis how can I do this plz help

IanWakeling · Posted 07-14-2015 04:36 AM

There is no equivalent of the by statement, instead you need to use a loop and matrix sub-setting to extract the data required. For example if all 12 numbers above are in a 4x3 matrix called x, then the following would extract the 2x2 matrices that you want to work further with.

r = 1:2; /* row numbers to extract from x */

do boot = 1 to 2;

y = x[r, 2:3];

< do stuff here with the 2x2 matrix y >

r = r + 2; /* increment row numbers for next boot */

end;

Rasheed · Posted 07-14-2015 08:34 AM

Thanks lan I used you following code but I got error in log window

Proc IML;

x={1 14 15

1 17 18

2 11 17

2 17 15

};

r=1:2; /* row numbers to extract from x */;

do boot=1 to 2;

y = x[r,2:3];

Print y;

r=r+2; /* increment row numbers for next boot */;

end;

Quit;

Log window error

52 x={1 14 15

53 1 17 18

54 2 11 17

55 2 17 15

56 };

57 r=1:2;

57 ! /* row numbers to extract from x */;

58 do boot=1 to 2;

59 y = x[r,2:3];

60 Print y;

61 r=r+2;

61 ! /* increment row numbers for next boot */;

62 end;

ERROR: (execution) Invalid subscript or subscript out of range.

IanWakeling · Posted 07-14-2015 08:45 AM

Try printing out the matrix x. It has only 1 row and 12 columns because you have not used commas to determine where one row of the matrix ends and the next begins.

Rasheed · Posted 07-15-2015 02:39 AM

Thank Ian

Ok

Every time 2X2 square matrix saved in same variable y

Is there any option from which I can save every matrix in different variable like y1 y2 etc.

Regards

Rick_SAS · Posted 07-14-2015 06:09 AM

Ian's method is perfect for this problem which has the same number of observations in each BY group. For more

complicated situations in which the number of observations vary across BY groups, you can use

The UNIQUE-LOC technique

or

An efficient alternative to the UNIQUE-LOC technique (the UNIQUEBY technique)

For example, here is a solution to your problem by using the general UNIQUEBY technique:

proc iml;

m = {1 0.0036813 0.0030405 ,
1 0.0030405 0.0032935 ,
2 0.0015928 0.0017098 ,
2 0.0017098 0.0020805
};

C = m[,1]; /* extract sorted categories */
x = m[,2:3]; /* and data */

/* 2. Obtain row numbers for the first observation in each level. */
b = uniqueby(C, 1); /* b = beginning of i_th category */

s = j(nrow(b),1);     /* 3. Allocate vector to hold results */
b = b // (nrow(C)+1); /* trick: append (n+1) to end of b */
do i = 1 to nrow(b)-1;    /* 4. For each level... */
   idx = b:(b[i+1]-1); /* 5. Find observations in level */
   s = det(x[idx,]);   /* 6. Compute statistic on those values */
end;
lbl = putn(u, "Best4."); /* convert numeric values to character */
print s[rowname=lbl];

Ksharp · Posted 07-14-2015 09:42 AM

As Rick pointed out, you can use UNIUQE-LOC Skill:

Code: Program


proc iml;
m = {1 0.0036813 0.0030405 ,
1 0.0030405 0.0032935 ,
2 0.0015928 0.0017098 ,
2 0.0017098 0.0020805 
};
unique=unique(m[,1]);
print unique;

do i=1 to ncol(unique);
 temp= m[loc(m[,1]=unique[i]),][,2:ncol(m)];
 
/*do something you need here*/
print temp;
end;
quit;

Log: Program

Notes (6)

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

53

54 proc iml;

NOTE: IML Ready

NOTE: Exiting IML.

NOTE: PROCEDURE IML used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

55 proc iml;

NOTE: IML Ready

56 m = {1 0.0036813 0.0030405 ,

57 1 0.0030405 0.0032935 ,

58 2 0.0015928 0.0017098 ,

59 2 0.0017098 0.0020805

60 };

61 unique=unique(m[,1]);

62 print unique;

63

64 do i=1 to ncol(unique);

65 temp= m[loc(m[,1]=unique),][,2:ncol(m)];

66

67 /*do something you need here*/

68 print temp;

69 end;

70 quit;

NOTE: Exiting IML.

NOTE: PROCEDURE IML used (Total process time):

real time 0.04 seconds

cpu time 0.04 seconds

71

72 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

82

Results: Program

1

2

0.0036813	0.0030405
0.0030405	0.0032935

0.0015928	0.0017098
0.0017098	0.0020805

Xia Keshan

Rasheed · Posted 07-15-2015 03:08 AM

thanks keshan

every time matrix is saved in temp variable

Can we save every split matrix in different variables like first matrix in temp1, second matrix in temp2 and so on

Rick_SAS · Posted 07-15-2015 06:41 AM

You can, but why would you want to? At the end of the day you are going to do N computations, one for each BY group. The standard way to do that is to use the same name for the matrix, but reassign it during each iteration. For most scenarios I can imagine, there is no need to keep all the matrices around, since, for example, you are probably not going to multiply matrces from different BY groups together.

How about letting us know what you are trying to accomplish and we can make some suggestions?

Rasheed · Posted 07-15-2015 07:43 AM

Thanks Rick actually I am doing bootstraping form multivariate normal distribution in which for each boot strap sample I need to calculate few matrices that is why I want to store each split matrix in different variable for further calculations

pls guide how can do this

Regards

Ksharp · Posted 07-15-2015 08:43 AM

Yes. You can . But you need make a macro.

Code: Program

data have;
input boot t1 t2;
cards;
1 0.0036813 0.0030405 
1 0.0030405 0.0032935 
2 0.0015928 0.0017098 
2 0.0017098 0.0020805
;
run;


%let table=have;
%let by_var=boot;
%let vars=t1 t2;



proc sql;
 select distinct cats("read all var {&vars} where (&by_var=",&by_var,") into temp",&by_var) into : list separated by ';'
  from have;
quit;
proc iml;
use &table;
&list;
close &table;

show names;
/*do something you need here*/
print temp1,temp2;
quit;

Log: Program

Notes (6)

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

53

54 data have;

55 input boot t1 t2;

56 cards;

NOTE: The data set WORK.HAVE has 4 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.00 seconds

cpu time 0.01 seconds

61 ;

62 run;

63

64

65 %let table=have;

66 %let by_var=boot;

67 %let vars=t1 t2;

68

69

70

71 proc sql;

72 select distinct cats("read all var {&vars} where (&by_var=",&by_var,") into temp",&by_var) into : list separated by ';'

73 from have;

74 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.04 seconds

cpu time 0.02 seconds

75 proc iml;

NOTE: IML Ready

76 use &table;

77 &list;

78 close &table;

79

80 show names;

81 /*do something you need here*/

82 print temp1,temp2;

83 quit;

NOTE: Exiting IML.

NOTE: PROCEDURE IML used (Total process time):

real time 0.14 seconds

cpu time 0.06 seconds

84

85 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;

95

Results: Program

read all var {t1 t2} where (boot=1) into temp1

read all var {t1 t2} where (boot=2) into temp2

 SYMBOL   ROWS   COLS TYPE   SIZE ------ ------ ------ ---- ------ temp1       2      2 num       8 temp2       2      2 num       8 Number of symbols = 5  (includes those without values)

0.0036813	0.0030405
0.0030405	0.0032935

0.0015928	0.0017098
0.0017098	0.0020805

Rick_SAS · Posted 07-15-2015 09:21 AM

I guess I need more details because I've done simulation from MVN data and I still don't see why you need multiple matrices. Are you dealing with time-varying models?

Here are some tips:

Do you need the MVN data, or just the sample covariance matrix? If covariance, think about using the Wishart distribution.
You can pack matrices into an array. I'd recommend that approach before you try to create a byunch of matrices named x1, x2, ..., xk.

Rasheed · Posted 07-15-2015 12:11 PM

Dear Rick

The simulation which I am doing is not a simple or can not be done directly from any statistical distribution

Details of my work, Actually my field is multivariate bioequivalence with higher order cross over designs

Higher order design means there are many sequences and periods than number of treatments to be compared

At first stage I have to draw 500 multivariate samples under higher order crossover design from normal distribution which was quite difficult and complicated task needs so many parameter to do this. Finally I did this successfully

Now at second stage for each simulated sample I need to obtained 2000 bootstrap samples and than calculate bioequivalence criteria. That bioequivalent criteria need so many matrix calculations and than I have to assess that criteria.

Therefore I need to store those matrices in separate variables to differentiate boot numbers so that boot wise criteria can be obtained

I hope now you have some understanding of my work

Kindly tell how can I do this . I am using following program

r = 1:2; /* row numbers to extract from x */

do boot = 1 to 2;

y = x[r, 2:3];

< do stuff here with the 2x2 matrix y >

r = r + 2; /* increment row numbers for next boot */

end;

Re: Alternative of by statement in Proc IML

Rick_SAS · Posted 07-15-2015 12:50 PM

You might want to use the SAMPLE function to draw the bootstrap samples. You can search my blog for "sample function" to see many examples. You can also click on the "Bootstrap" word in the wordcloud sidebar to see various bootstrap examples.

My preference would be to save the information about each bootstrap sample as a row in a big 2000-row "results" matrix, as shown in the second link ("pack matrices into an array). Then you can aggregate statistics or multiply across bootstrap samples as needed.

Code: Program

Log: Program

Results: Program

Code: Program

Log: Program

Results: Program

Ready to join fellow brilliant minds for the SAS Hackathon?