BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tiny_Kane
Obsidian | Level 7

Hi there,

 

I have two datasets (matrix) and want to merge them in the proc iml; For instance, I have data1 and data2, how can I merge them by name to get data3? My ultimate goal is to do permutation shuffle of the group assignment, for every time shuffle, I need to merge to the data2 to get a data set like data3, and calculate the mean value by group. Thank you. 

 

Data1

name  group
A 1
B 2
C 2
D 1

Data2

name  value
A 10
A 20
A 15
B 30
B 12
C 55

Data3

name  group value
A 1 10
A 1 20
A 1 15
B 2 30
B 2 12
C 1 55

 

 

Kane

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

It sounds similar to the permutation tests that I referenced. In any permutation test, you permute the subjects and then (re)compute the statistic on the permuted data. Do you know how to compute the means for the original data in SAS/IML?

 

Anyway, you can use either the RANPERM function in SAS/IML or use the SAMPLE function to sample without replacement..

Then use the LOC function to find the patient IDs. Then use the LOC-ELEMENT technique to compute the mean of the values in the permuted data. I think the following is close to what you want:

 

data Data1;
input name  $ group;
cards;
A	1
B	2
C	2
D	1
;

data Data2;
input _name $ value;
cards;
A	10
A	20
A	15
B	30
B	12
C	55
D  15
D  25
;

proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;

call randseed(1234);
numPerm = 10;
means = j(numPerm, 2);
do i = 1 to numPerm;
   p = sample(group, nrow(group), "WOR"); /* permute patients */
   g1 = name[ loc(p=1) ];                 /* patient IDs for this group */
   idx = loc( element(_name, g1) );       /* which values are for this group? */
   means[i, 1] = mean( value[idx] );      /* mean of values for this group */

   g2 = name[ loc(p=2) ];
   idx = loc( element(_name, g2) );
   means[i, 2] = mean( value[idx] );
end;

print means;

 

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

There's really no such thing as a data set in IML, they're all matrices.

 

The easiest thing to do is perform the merge is a SAS data step, then have IML read in the results.

--
Paige Miller
Tiny_Kane
Obsidian | Level 7

oh...that's sad...I need to do permutation shuffle of the group assignment 1000 times. For every time shuffle, I need to merge the random assignment back to the data 2 and calculate the mean value by group....Do you have an better idea to combine the permutation result and doing the calculation effectively? Thank you very much. 

Rick_SAS
SAS Super FREQ

SAS/IML is ideal for permutation tests and other resampling methods. For some examples, see

"Resampling and permutation tests in SAS" and the references at the end.

You can also read Vickery's 2015 paper "Permit Me to Permute: A Basic Introduction to Permutation Tests with SAS/IML"

Tiny_Kane
Obsidian | Level 7

Hi Rick, thanks for the links. But I did not see any permutation test example which is similar to my case. My case is sort of tricky because the unit of permutation is patient level, while the test calculation unit is the lower level, it is monthly cost. Each patient has multiple monthly cost, and the number of monthly cost varies...Therefore, I need to permute the data1 first, then merge the permutation result to the data2 to calculate the test statistic...Have you seen any paper presenting similar example? Thank you.

Ksharp
Super User

loc-index skill:

 

data Data1;
input name  $ group;
cards;
A	1
B	2
C	2
D	1
;

data Data2;
input _name $ value;
cards;
A	10
A	20
A	15
B	30
B	12
C	55
;

proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;
_group=j(nobs,1,.);
level=unique(_name);
do i=1 to ncol(level);
  temp=level[i];
  idx=loc(_name=temp);
  _group[idx]=group[loc(name=temp)];
end;

create want var{_name value _group};
append;
close;
quit;
Rick_SAS
SAS Super FREQ

It sounds similar to the permutation tests that I referenced. In any permutation test, you permute the subjects and then (re)compute the statistic on the permuted data. Do you know how to compute the means for the original data in SAS/IML?

 

Anyway, you can use either the RANPERM function in SAS/IML or use the SAMPLE function to sample without replacement..

Then use the LOC function to find the patient IDs. Then use the LOC-ELEMENT technique to compute the mean of the values in the permuted data. I think the following is close to what you want:

 

data Data1;
input name  $ group;
cards;
A	1
B	2
C	2
D	1
;

data Data2;
input _name $ value;
cards;
A	10
A	20
A	15
B	30
B	12
C	55
D  15
D  25
;

proc iml;
use data1;
read all var{name group};
close;
use data2 nobs nobs;
read all var{_name value};
close;

call randseed(1234);
numPerm = 10;
means = j(numPerm, 2);
do i = 1 to numPerm;
   p = sample(group, nrow(group), "WOR"); /* permute patients */
   g1 = name[ loc(p=1) ];                 /* patient IDs for this group */
   idx = loc( element(_name, g1) );       /* which values are for this group? */
   means[i, 1] = mean( value[idx] );      /* mean of values for this group */

   g2 = name[ loc(p=2) ];
   idx = loc( element(_name, g2) );
   means[i, 2] = mean( value[idx] );
end;

print means;

 

Tiny_Kane
Obsidian | Level 7

Hi Rick,

 

Thank you very much for your code. It works well! I got the wanted p-values.

 

Thanks!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 7 replies
  • 1923 views
  • 0 likes
  • 4 in conversation