BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ivs
Calcite | Level 5 ivs
Calcite | Level 5

in IML to remove observations from data set i can use

EDIT (data);

DELETE (observation);

PURGE;

CLOSE (data);

how to remove a VARIABLE?

do i need to transpose data so my variables become observations? is there a more efficient way?

many thanks in advance for your help!

ivs

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

So the end result is about N/2 pairs of variables? For a 6x6 matrix you might get

(X3, X4) [largest corr]

(X2, X6) [largest that doesn't include X3 or X4]

(X1, X5) [largest that doesn't include X3, X4, X2, or X6]

One possible approach is to not delete the rows or columns, but just set the elements of those rows and columns to missing. That way all the computations are done on the same matrix and it is easy to index into the variable names.

Because a correlation matrix is symmetric, I think you should delete two rows and two columns at each step.

I would feel bad if I took away the fun of solving this problem, so I'll just post some code that seems relevant and let you enjoy deciphering it and adapting it to your needs.

/* example corr matrix */
proc corr data=sashelp.cars noprint out=corr(where=(_type_="CORR")) ;
run;
proc iml;

use corr;
read all var _NUM_ into X[c=varNames];
close corr;
diagIdx = do(1, nrow(X)*ncol(X), ncol(x)+1);
x[diagIdx] = .;


start ind2sub( p, ind );
   idx = colvec(ind);
   n = nrow(idx);
   col = 1 + mod(idx-1, p);
   row = 1 + (idx-col) / p;
   return ( row || col );
finish;

start PairCorr(x);
   y = x; /* copy. We'll overwrite this with missing values */
   k = floor(ncol(x)/2);
   subscripts = j(k,2);
   do i = 1 to k;
      s = ind2sub(ncol(y), y[<:>]); *call function get (row,col) directly without looping;
      y[ s, ] = .;
      y[ , s] = .;
      print i s y;
      subscripts[i,] = s;
   end;
   return( subscripts );
finish;

pc = PairCorr(x);
results = varNames[ pc[,1] ] || varNames[ pc[,2] ];
print x results;

View solution in original post

11 REPLIES 11
Rick_SAS
SAS Super FREQ

Although it is possible to use SAS/IML to perform some basic data set maintenance, or extensive maintenance taks most people use the DATA step, like so:

data A;

set sashelp.Class;

drop Age;

run;

But you don't need to physically remove a variable in order to exclude it from a PROC IML computation. Instead I use the DROP or KEEP statements to exclude it from being read when I read data into a matrix.  For example, if I want all numerical variables EXCEPT the AGE variable read into a matrix, I could use any of the following statements:


use sashelp.class(drop=AGE);
read all var _NUM_ into X[c=VarNames];
close sashelp.class;


use sashelp.class(keep=Height Weight);
read all var _NUM_ into Y[c=VarNames];
close sashelp.class;

use sashelp.class;

read all var {Height Weight} into Z[c=VarNames];

close sashelp.class;

ivs
Calcite | Level 5 ivs
Calcite | Level 5

Thank you Rick! I know how to use DROP in data step or how to omit reading a variable into matrix. Unfortunately I can't see how to use these techniques in my project. I have a huge correlation matrix (actually just lower half of it), and I need to extract max item with corresponding row and column, then i need to drop this pair of variables out of the matrix and re-run the extraction. I am basically after a list of max pairwise correlations.

everything in my code is running smoothly (thanks to your input!) except for extraction of used up variables from the data. because the code is in IML i am looking for a way to drop a variable from data set in similar way as deleting an observation from a data set.

if you can think of a solution, i'd greatly appreciate it.

ivs

Tom
Super User Tom
Super User

If your data is in a matrix then perhaps instead you mean how to delete a column from a matrix?  Or if it is a correlation matrix how to delete a column and the corresponding row?

ivs
Calcite | Level 5 ivs
Calcite | Level 5

hi Tom, yes that's exactly what I am trying to do. Find max correlation in the matrix, save it and the pair of corresponding variables, then remove this pair of variables from the matrix and re-run the search for the mew max correlation. Thanks in advance if you know an efficient way to do that.

i found a way to delete rows and columns based on their intersection in the matrix:

start delrow(x,i);

  return(x[setdif(1:nrow(x),i),]);

finish;

start delcol(x,i);

  return(x[,setdif(1:ncol(x),i)]);

finish;

X=delcol(X,ColM); *here ColM is the col number of the intersection;

X=delrow(X,RowM); *here RowM is the row number of the intersection;

the problem with this approach is that i loose VarNames and row names.

ivs
Calcite | Level 5 ivs
Calcite | Level 5

i think i solved it through a macro variable. its not pretty, but it is working.

Rick_SAS
SAS Super FREQ

So the end result is about N/2 pairs of variables? For a 6x6 matrix you might get

(X3, X4) [largest corr]

(X2, X6) [largest that doesn't include X3 or X4]

(X1, X5) [largest that doesn't include X3, X4, X2, or X6]

One possible approach is to not delete the rows or columns, but just set the elements of those rows and columns to missing. That way all the computations are done on the same matrix and it is easy to index into the variable names.

Because a correlation matrix is symmetric, I think you should delete two rows and two columns at each step.

I would feel bad if I took away the fun of solving this problem, so I'll just post some code that seems relevant and let you enjoy deciphering it and adapting it to your needs.

/* example corr matrix */
proc corr data=sashelp.cars noprint out=corr(where=(_type_="CORR")) ;
run;
proc iml;

use corr;
read all var _NUM_ into X[c=varNames];
close corr;
diagIdx = do(1, nrow(X)*ncol(X), ncol(x)+1);
x[diagIdx] = .;


start ind2sub( p, ind );
   idx = colvec(ind);
   n = nrow(idx);
   col = 1 + mod(idx-1, p);
   row = 1 + (idx-col) / p;
   return ( row || col );
finish;

start PairCorr(x);
   y = x; /* copy. We'll overwrite this with missing values */
   k = floor(ncol(x)/2);
   subscripts = j(k,2);
   do i = 1 to k;
      s = ind2sub(ncol(y), y[<:>]); *call function get (row,col) directly without looping;
      y[ s, ] = .;
      y[ , s] = .;
      print i s y;
      subscripts[i,] = s;
   end;
   return( subscripts );
finish;

pc = PairCorr(x);
results = varNames[ pc[,1] ] || varNames[ pc[,2] ];
print x results;

ivs
Calcite | Level 5 ivs
Calcite | Level 5

hi Rick. This is very cool code. After trying for a few hours to add the corresponding correlation number to the output i am ready to give up. Please advise what's the best way to go about this last component. Many thanks in advance for your help!

ivs

ivs
Calcite | Level 5 ivs
Calcite | Level 5

hi Rick,

i solved it! not very pretty but it works. Here is your code with comments on how i understand it. Please let me know if its OK to concatenate M to s when reading do loop result into subscripts. Many thanks in advance!

proc iml;

use corr;

read all var _NUM_ into X[c=varNames];

close corr;

diagIdx = do(1, nrow(X)*ncol(X), ncol(x)+1); *change all diagonal 1s;

x[diagIdx] = .; *to missing;

start ind2sub( p, ind ); *declare function to determine row and column for max corr;

   idx = colvec(ind);

   n = nrow(idx);

   col = 1 + mod(idx-1, p);

   row = 1 + (idx-col) / p;

   return ( row || col );

finish;

start PairCorr(x); *declare function to pick pairs of vars with max corrs;

   y = x; /*copy of the corr matrix. We'll overwrite this with missing values */

   k = floor(ncol(x)/2); *calculate number of pairs;

   subscripts = j(k,3); *create matrix to store results of do loop;

   do i = 1 to k;

     M=max(y);

   s = ind2sub(ncol(y), y[<:>]); *call function get (row col) directly without looping;

      y[ s, ] = .; *set row with used up corr to missing;

      y[ , s] = .; *set column with used up corr to missing;

      subscripts[i,] = s || M; *here i am not sure if i am doing it correctly;

   end;

   return( subscripts );

finish;

pc = PairCorr(x);

results = varNames[ pc[,1] ] || varNames[ pc[,2] ] || char(pc[,3]); *here is also a question mark about if it is ok to convert value to character and use it;

print results;

create pairs from results;

append from results; *saved data looks good and I am happy with the result.

quit;

Rick_SAS
SAS Super FREQ

start ind2sub( p, ind ); *function to convert an index into a row and column subscript;

...

The other comments look fine. You probably want to keep the correlation as a numeric variable, so instead of converting pc[,3] to a character value, write it as a numeric value:

var1 = varNames[ pc[,1] ];

var2 = varNames[ pc[,2] ];

maxcorr = pc[,3];

create pairs var {var1 var2 maxcorr};

append;

close pairs;

ivs
Calcite | Level 5 ivs
Calcite | Level 5

great! thanks a million for all your help!

Rick_SAS
SAS Super FREQ

It was an interesting problem!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 11 replies
  • 4848 views
  • 6 likes
  • 3 in conversation