Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-19-2013 03:34 PM
(4584 views)

in IML to remove observations from data set i can use

EDIT (data);

DELETE (observation);

PURGE;

CLOSE (data);

how to remove a VARIABLE?

do i need to transpose data so my variables become observations? is there a more efficient way?

many thanks in advance for your help!

ivs

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So the end result is about N/2 pairs of variables? For a 6x6 matrix you might get

(X3, X4) [largest corr]

(X2, X6) [largest that doesn't include X3 or X4]

(X1, X5) [largest that doesn't include X3, X4, X2, or X6]

One possible approach is to not delete the rows or columns, but just set the elements of those rows and columns to missing. That way all the computations are done on the same matrix and it is easy to index into the variable names.

Because a correlation matrix is symmetric, I think you should delete two rows and two columns at each step.

I would feel bad if I took away the fun of solving this problem, so I'll just post some code that seems relevant and let you enjoy deciphering it and adapting it to your needs.

/* example corr matrix */

proc corr data=sashelp.cars noprint out=corr(where=(_type_="CORR")) ;

run;

proc iml;

use corr;

read all var _NUM_ into X[c=varNames];

close corr;

diagIdx = do(1, nrow(X)*ncol(X), ncol(x)+1);

x[diagIdx] = .;

start ind2sub( p, ind );

idx = colvec(ind);

n = nrow(idx);

col = 1 + mod(idx-1, p);

row = 1 + (idx-col) / p;

return ( row || col );

finish;

start PairCorr(x);

y = x; /* copy. We'll overwrite this with missing values */

k = floor(ncol(x)/2);

subscripts = j(k,2);

do i = 1 to k;

s = ind2sub(ncol(y), y[<:>]); *call function get (row,col) directly without looping;

y[ s, ] = .;

y[ , s] = .;

print i s y;

subscripts[i,] = s;

end;

return( subscripts );

finish;

pc = PairCorr(x);

results = varNames[ pc[,1] ] || varNames[ pc[,2] ];

print x results;

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Although it is possible to use SAS/IML to perform some basic data set maintenance, or extensive maintenance taks most people use the DATA step, like so:

data A;

set sashelp.Class;

drop Age;

run;

But you don't need to physically remove a variable in order to exclude it from a PROC IML computation. Instead I use the DROP or KEEP statements to exclude it from being read when I read data into a matrix. For example, if I want all numerical variables EXCEPT the AGE variable read into a matrix, I could use any of the following statements:

use sashelp.class(drop=AGE);

read all var _NUM_ into X[c=VarNames];

close sashelp.class;

use sashelp.class(keep=Height Weight);

read all var _NUM_ into Y[c=VarNames];

close sashelp.class;

use sashelp.class;

read all var {Height Weight} into Z[c=VarNames];

close sashelp.class;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Rick! I know how to use DROP in data step or how to omit reading a variable into matrix. Unfortunately I can't see how to use these techniques in my project. I have a huge correlation matrix (actually just lower half of it), and I need to extract max item with corresponding row and column, then i need to drop this pair of variables out of the matrix and re-run the extraction. I am basically after a list of max pairwise correlations.

everything in my code is running smoothly (thanks to your input!) except for extraction of used up variables from the data. because the code is in IML i am looking for a way to drop a variable from data set in similar way as deleting an observation from a data set.

if you can think of a solution, i'd greatly appreciate it.

ivs

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

hi Tom, yes that's exactly what I am trying to do. Find max correlation in the matrix, save it and the pair of corresponding variables, then remove this pair of variables from the matrix and re-run the search for the mew max correlation. Thanks in advance if you know an efficient way to do that.

i found a way to delete rows and columns based on their intersection in the matrix:

start delrow(x,i);

return(x[setdif(1:nrow(x),i),]);

finish;

start delcol(x,i);

return(x[,setdif(1:ncol(x),i)]);

finish;

X=delcol(X,ColM); *here ColM is the col number of the intersection;

X=delrow(X,RowM); *here RowM is the row number of the intersection;

the problem with this approach is that i loose VarNames and row names.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

i think i solved it through a macro variable. its not pretty, but it is working.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

So the end result is about N/2 pairs of variables? For a 6x6 matrix you might get

(X3, X4) [largest corr]

(X2, X6) [largest that doesn't include X3 or X4]

(X1, X5) [largest that doesn't include X3, X4, X2, or X6]

One possible approach is to not delete the rows or columns, but just set the elements of those rows and columns to missing. That way all the computations are done on the same matrix and it is easy to index into the variable names.

Because a correlation matrix is symmetric, I think you should delete two rows and two columns at each step.

I would feel bad if I took away the fun of solving this problem, so I'll just post some code that seems relevant and let you enjoy deciphering it and adapting it to your needs.

/* example corr matrix */

proc corr data=sashelp.cars noprint out=corr(where=(_type_="CORR")) ;

run;

proc iml;

use corr;

read all var _NUM_ into X[c=varNames];

close corr;

diagIdx = do(1, nrow(X)*ncol(X), ncol(x)+1);

x[diagIdx] = .;

start ind2sub( p, ind );

idx = colvec(ind);

n = nrow(idx);

col = 1 + mod(idx-1, p);

row = 1 + (idx-col) / p;

return ( row || col );

finish;

start PairCorr(x);

y = x; /* copy. We'll overwrite this with missing values */

k = floor(ncol(x)/2);

subscripts = j(k,2);

do i = 1 to k;

s = ind2sub(ncol(y), y[<:>]); *call function get (row,col) directly without looping;

y[ s, ] = .;

y[ , s] = .;

print i s y;

subscripts[i,] = s;

end;

return( subscripts );

finish;

pc = PairCorr(x);

results = varNames[ pc[,1] ] || varNames[ pc[,2] ];

print x results;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

hi Rick. This is very cool code. After trying for a few hours to add the corresponding correlation number to the output i am ready to give up. Please advise what's the best way to go about this last component. Many thanks in advance for your help!

ivs

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

hi Rick,

i solved it! not very pretty but it works. Here is your code with comments on how i understand it. Please let me know if its OK to concatenate M to s when reading do loop result into subscripts. Many thanks in advance!

proc iml;

use corr;

read all var _NUM_ into X[c=varNames];

close corr;

diagIdx = do(1, nrow(X)*ncol(X), ncol(x)+1); *change all diagonal 1s;

x[diagIdx] = .; *to missing;

start ind2sub( p, ind ); *declare function to determine row and column for max corr;

idx = colvec(ind);

n = nrow(idx);

col = 1 + mod(idx-1, p);

row = 1 + (idx-col) / p;

return ( row || col );

finish;

start PairCorr(x); *declare function to pick pairs of vars with max corrs;

y = x; /*copy of the corr matrix. We'll overwrite this with missing values */

k = floor(ncol(x)/2); *calculate number of pairs;

subscripts = j(k,3); *create matrix to store results of do loop;

do i = 1 to k;

M=max(y);

s = ind2sub(ncol(y), y[<:>]); *call function get (row col) directly without looping;

y[ s, ] = .; *set row with used up corr to missing;

y[ , s] = .; *set column with used up corr to missing;

subscripts[i,] = s || M; *here i am not sure if i am doing it correctly;

end;

return( subscripts );

finish;

pc = PairCorr(x);

results = varNames[ pc[,1] ] || varNames[ pc[,2] ] || char(pc[,3]); *here is also a question mark about if it is ok to convert value to character and use it;

print results;

create pairs from results;

append from results; *saved data looks good and I am happy with the result.

quit;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

start ind2sub( p, ind ); *function to convert an index into a row and column subscript;

...

The other comments look fine. You probably want to keep the correlation as a numeric variable, so instead of converting pc[,3] to a character value, write it as a numeric value:

var1 = varNames[ pc[,1] ];

var2 = varNames[ pc[,2] ];

maxcorr = pc[,3];

create pairs var {var1 var2 maxcorr};

append;

close pairs;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

great! thanks a million for all your help!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It was an interesting problem!

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.