Solved: How can I merge data sets in SAS/IML

Tiny_Kane · Posted 06-21-2018 04:27 PM

Hi there,

I have two datasets (matrix) and want to merge them in the proc iml; For instance, I have data1 and data2, how can I merge them by name to get data3? My ultimate goal is to do permutation shuffle of the group assignment, for every time shuffle, I need to merge to the data2 to get a data set like data3, and calculate the mean value by group. Thank you.

Data1

name	group
A	1
B	2
C	2
D	1

Data2

name	value
A	10
A	20
A	15
B	30
B	12
C	55

Data3

name	group	value
A	1	10
A	1	20
A	1	15
B	2	30
B	2	12
C	1	55

Kane

Rick_SAS · Posted 06-23-2018 10:42 PM

It sounds similar to the permutation tests that I referenced. In any permutation test, you permute the subjects and then (re)compute the statistic on the permuted data. Do you know how to compute the means for the original data in SAS/IML?

Anyway, you can use either the RANPERM function in SAS/IML or use the SAMPLE function to sample without replacement..

Then use the LOC function to find the patient IDs. Then use the LOC-ELEMENT technique to compute the mean of the values in the permuted data. I think the following is close to what you want:

data Data1;
input name  $ group;
cards;
A	1
B	2
C	2
D	1
;

data Data2;
input _name $ value;
cards;
A	10
A	20
A	15
B	30
B	12
C	55
D  15
D  25
;

proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;

call randseed(1234);
numPerm = 10;
means = j(numPerm, 2);
do i = 1 to numPerm;
   p = sample(group, nrow(group), "WOR"); /* permute patients */
   g1 = name[ loc(p=1) ];                 /* patient IDs for this group */
   idx = loc( element(_name, g1) );       /* which values are for this group? */
   means[i, 1] = mean( value[idx] );      /* mean of values for this group */

   g2 = name[ loc(p=2) ];
   idx = loc( element(_name, g2) );
   means[i, 2] = mean( value[idx] );
end;

print means;

View solution in original post

PaigeMiller · Posted 06-21-2018 04:29 PM

There's really no such thing as a data set in IML, they're all matrices.

The easiest thing to do is perform the merge is a SAS data step, then have IML read in the results.

--
Paige Miller

Tiny_Kane · Posted 06-21-2018 04:32 PM

oh...that's sad...I need to do permutation shuffle of the group assignment 1000 times. For every time shuffle, I need to merge the random assignment back to the data 2 and calculate the mean value by group....Do you have an better idea to combine the permutation result and doing the calculation effectively? Thank you very much.

Rick_SAS · Posted 06-21-2018 06:18 PM

SAS/IML is ideal for permutation tests and other resampling methods. For some examples, see

"Resampling and permutation tests in SAS" and the references at the end.

You can also read Vickery's 2015 paper "Permit Me to Permute: A Basic Introduction to Permutation Tests with SAS/IML"

Tiny_Kane · Posted 06-22-2018 12:27 PM

Hi Rick, thanks for the links. But I did not see any permutation test example which is similar to my case. My case is sort of tricky because the unit of permutation is patient level, while the test calculation unit is the lower level, it is monthly cost. Each patient has multiple monthly cost, and the number of monthly cost varies...Therefore, I need to permute the data1 first, then merge the permutation result to the data2 to calculate the test statistic...Have you seen any paper presenting similar example? Thank you.

Ksharp · Posted 06-22-2018 09:26 AM

loc-index skill:

data Data1;
input name  $ group;
cards;
A	1
B	2
C	2
D	1
;

data Data2;
input _name $ value;
cards;
A	10
A	20
A	15
B	30
B	12
C	55
;

proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;
_group=j(nobs,1,.);
level=unique(_name);
do i=1 to ncol(level);
  temp=level[i];
  idx=loc(_name=temp);
  _group[idx]=group[loc(name=temp)];
end;

create want var{_name value _group};
append;
close;
quit;

Rick_SAS · Posted 06-23-2018 10:42 PM

It sounds similar to the permutation tests that I referenced. In any permutation test, you permute the subjects and then (re)compute the statistic on the permuted data. Do you know how to compute the means for the original data in SAS/IML?

Anyway, you can use either the RANPERM function in SAS/IML or use the SAMPLE function to sample without replacement..

Then use the LOC function to find the patient IDs. Then use the LOC-ELEMENT technique to compute the mean of the values in the permuted data. I think the following is close to what you want:

data Data1;
input name  $ group;
cards;
A	1
B	2
C	2
D	1
;

data Data2;
input _name $ value;
cards;
A	10
A	20
A	15
B	30
B	12
C	55
D  15
D  25
;

proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;

call randseed(1234);
numPerm = 10;
means = j(numPerm, 2);
do i = 1 to numPerm;
   p = sample(group, nrow(group), "WOR"); /* permute patients */
   g1 = name[ loc(p=1) ];                 /* patient IDs for this group */
   idx = loc( element(_name, g1) );       /* which values are for this group? */
   means[i, 1] = mean( value[idx] );      /* mean of values for this group */

   g2 = name[ loc(p=2) ];
   idx = loc( element(_name, g2) );
   means[i, 2] = mean( value[idx] );
end;

print means;

Tiny_Kane · Posted 06-26-2018 02:01 PM

Hi Rick,

Thank you very much for your code. It works well! I got the wanted p-values.

Thanks!

How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML

Re: How can I merge data sets in SAS/IML