Hi there,
I have two datasets (matrix) and want to merge them in the proc iml; For instance, I have data1 and data2, how can I merge them by name to get data3? My ultimate goal is to do permutation shuffle of the group assignment, for every time shuffle, I need to merge to the data2 to get a data set like data3, and calculate the mean value by group. Thank you.
Data1
name | group |
A | 1 |
B | 2 |
C | 2 |
D | 1 |
Data2
name | value |
A | 10 |
A | 20 |
A | 15 |
B | 30 |
B | 12 |
C | 55 |
Data3
name | group | value |
A | 1 | 10 |
A | 1 | 20 |
A | 1 | 15 |
B | 2 | 30 |
B | 2 | 12 |
C | 1 | 55 |
Kane
It sounds similar to the permutation tests that I referenced. In any permutation test, you permute the subjects and then (re)compute the statistic on the permuted data. Do you know how to compute the means for the original data in SAS/IML?
Anyway, you can use either the RANPERM function in SAS/IML or use the SAMPLE function to sample without replacement..
Then use the LOC function to find the patient IDs. Then use the LOC-ELEMENT technique to compute the mean of the values in the permuted data. I think the following is close to what you want:
data Data1;
input name $ group;
cards;
A 1
B 2
C 2
D 1
;
data Data2;
input _name $ value;
cards;
A 10
A 20
A 15
B 30
B 12
C 55
D 15
D 25
;
proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;
call randseed(1234);
numPerm = 10;
means = j(numPerm, 2);
do i = 1 to numPerm;
p = sample(group, nrow(group), "WOR"); /* permute patients */
g1 = name[ loc(p=1) ]; /* patient IDs for this group */
idx = loc( element(_name, g1) ); /* which values are for this group? */
means[i, 1] = mean( value[idx] ); /* mean of values for this group */
g2 = name[ loc(p=2) ];
idx = loc( element(_name, g2) );
means[i, 2] = mean( value[idx] );
end;
print means;
There's really no such thing as a data set in IML, they're all matrices.
The easiest thing to do is perform the merge is a SAS data step, then have IML read in the results.
oh...that's sad...I need to do permutation shuffle of the group assignment 1000 times. For every time shuffle, I need to merge the random assignment back to the data 2 and calculate the mean value by group....Do you have an better idea to combine the permutation result and doing the calculation effectively? Thank you very much.
SAS/IML is ideal for permutation tests and other resampling methods. For some examples, see
"Resampling and permutation tests in SAS" and the references at the end.
You can also read Vickery's 2015 paper "Permit Me to Permute: A Basic Introduction to Permutation Tests with SAS/IML"
Hi Rick, thanks for the links. But I did not see any permutation test example which is similar to my case. My case is sort of tricky because the unit of permutation is patient level, while the test calculation unit is the lower level, it is monthly cost. Each patient has multiple monthly cost, and the number of monthly cost varies...Therefore, I need to permute the data1 first, then merge the permutation result to the data2 to calculate the test statistic...Have you seen any paper presenting similar example? Thank you.
loc-index skill:
data Data1;
input name $ group;
cards;
A 1
B 2
C 2
D 1
;
data Data2;
input _name $ value;
cards;
A 10
A 20
A 15
B 30
B 12
C 55
;
proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;
_group=j(nobs,1,.);
level=unique(_name);
do i=1 to ncol(level);
temp=level[i];
idx=loc(_name=temp);
_group[idx]=group[loc(name=temp)];
end;
create want var{_name value _group};
append;
close;
quit;
It sounds similar to the permutation tests that I referenced. In any permutation test, you permute the subjects and then (re)compute the statistic on the permuted data. Do you know how to compute the means for the original data in SAS/IML?
Anyway, you can use either the RANPERM function in SAS/IML or use the SAMPLE function to sample without replacement..
Then use the LOC function to find the patient IDs. Then use the LOC-ELEMENT technique to compute the mean of the values in the permuted data. I think the following is close to what you want:
data Data1;
input name $ group;
cards;
A 1
B 2
C 2
D 1
;
data Data2;
input _name $ value;
cards;
A 10
A 20
A 15
B 30
B 12
C 55
D 15
D 25
;
proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;
call randseed(1234);
numPerm = 10;
means = j(numPerm, 2);
do i = 1 to numPerm;
p = sample(group, nrow(group), "WOR"); /* permute patients */
g1 = name[ loc(p=1) ]; /* patient IDs for this group */
idx = loc( element(_name, g1) ); /* which values are for this group? */
means[i, 1] = mean( value[idx] ); /* mean of values for this group */
g2 = name[ loc(p=2) ];
idx = loc( element(_name, g2) );
means[i, 2] = mean( value[idx] );
end;
print means;
Hi Rick,
Thank you very much for your code. It works well! I got the wanted p-values.
Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.