Solved: Re: Subset a Matrix

JimK · Posted 01-07-2013 06:15 PM

I create a matrix that is 45x45 fairly using my a data step.

However, I need to parse it down to 30x30 and the output from the procedure is nonsensical for me (it does not matter how the input data file is sorted).

Thus, I need to delete the same rows and columns from the matrix:

to_delete=(1,4,20,23,26,26,30,31,32,33,34,35,36,37,38)

So the final matrix would only contain row 2,3,5,6...etc of the original matrix and be square(nxn)

I have been stumped thus far.

I have been trying variations of this

data output;set input;

keep _COL_ =: to_delete;run;quit;

art297 · Posted 01-08-2013 06:03 PM

Will this do what you want?

data to_delete;

set large_matrix;

keep _COL_;

if missing (Female);

run;

proc sql noprint;

create table keep_small_matrix

as Select name

from dictionary.columns

where libname="WORK" and

memname="LARGE_MATRIX" and

name ne "_COL_"

;

DELETE from keep_small_matrix

WHERE name in (select * from work.to_delete)

;

select name

into : keeps

separated by ","

from keep_small_matrix

;

create table small_matrix as Select _COL_,&keeps.

from large_matrix

;

DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)

;

QUIT;

View solution in original post

art297 · Posted 01-07-2013 06:36 PM

Not sure what you mean. Can you provide a small example (as a datastep) that represents, say, a 5x5 matrix and a second datastep showing the result you want to achieve?

data_null__ · Posted 01-07-2013 07:44 PM

The "problem" is that rows can be easily referenced by number or other ID and can be DELETED with an executable statement. The columns however have to be referenced by name and cannot be DROPPED with and executable statement. One way to approach the problem would be transpose the data so that you can reference the rows and columns in the same way.

data mat;
   input row $ a b c d e f;
   cards; 
a 1 2 3 4 5 6
b 1 2 3 4 5 6
c 1 2 3 4 5 6
d 1 2 3 4 5 6
e 1 2 3 4 5 6
f 1 2 3 4 5 6
;;;;
   run; 
proc transpose name=col data=mat out=mat2 prefix=m;
   by row;
   run; 

data remove;
   set mat2;
   if row in('a','c') then delete;
   if col in('a','c') then delete;
   run; 
proc sort; 
   by row col;
   run; 
proc transpose data=remove out=mat3;
   by row;
   id col;
   run;

JimK · Posted 01-08-2013 11:07 AM

data remove;
   set mat2;
   if row in('a','c') then delete;
   if col in('a','c') then delete;
   run;

I need to delete something like ~1000 rows and columns from a 10,000 x 10,000 matrix, do I need to put the columns into an array then somehow reference the array in the data step?

data_null__ · Posted 01-08-2013 12:09 PM

Put the row/column ID into a SAS data set and transpose that to an ARRAY(see below). Then use the array in the IN operator.

You may find that you can generate the list with a DO loop if you use numbered row/column ID.

do delete=4 to 10, 32, 56, 100 to 200......

data mat;
   input row $ a b c d e f;
   cards;
a 1 2 3 4 5 6
b 1 2 3 4 5 6
c 1 2 3 4 5 6
d 1 2 3 4 5 6
e 1 2 3 4 5 6
f 1 2 3 4 5 6
;;;;
   run;
proc transpose name=col data=mat out=mat2 prefix=m;
   by row;
   run;
data delete;
   input del :$1. @@;
   cards;
a c
;;;;
   run;
proc transpose data=delete out=delete prefix=del;
   var del;
   run;
data remove;
   set mat2;
   if _n_ eq 1 then set delete;
   array del

del:;

   if row in del or col in del then delete;
   run;
proc sort;
   by row col;
   run;
proc transpose data=remove out=mat3;
   by row;
   id col;
   run;

art297 · Posted 01-07-2013 08:17 PM

If DN correctly interpreted what you want to do, here is a way to do it in a data step:

data have;

input a b c d e f g h i j;

cards;

1 1 1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3

4 4 4 4 4 4 4 4 4 4

5 5 5 5 5 5 5 5 5 5

6 6 6 6 6 6 6 6 6 6

7 7 7 7 7 7 7 7 7 7

8 8 8 8 8 8 8 8 8 8

9 9 9 9 9 9 9 9 9 9

0 0 0 0 0 0 0 0 0 0

;

%let to_delete=(1,4,6,8);

proc sql noprint;

select name into :deletes

separated by " "

from dictionary.columns

where libname="WORK" and

memname="HAVE" and

varnum in &to_delete

;

quit;

data want (drop=&deletes.);

set have;

if _n_ not in &to_delete.;

run;

JimK · Posted 01-08-2013 05:07 PM

I feel as though I can do this in PROC SQL but havent been successful today.

I'm very close. I can delete the rows, but am stuck on deleting the columns

Again, my "large matrix" has both rows and columns with the same variable names (COL1, COL2....COLi) so i merely create a dataset that has a list of these based off of some critera:

data to_delete; set large_matrix;

keep _COL_;

if missing (Female);run;quit;

to_delete contains random columns from which I want to delete both the row and column

This successfully deletes the proper rows (not difficult). Deleting the proper columns is stumping though.

proc sql;

create table small_matrix as Select * from large_matrix;

DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete);

/* alter table work.kjk drop column in (work.tod) Does not work to remove columns */

;

QUIT;

Thank you to all for your time.

art297 · Posted 01-08-2013 05:15 PM

Can you show a SMALL example (say for 3 columns/rows) of what large_matrix looks like?

JimK · Posted 01-08-2013 05:24 PM

Female	Male	_COL_	COL1	COL2	COL3	COL4	COL5	COL6	COL7	COL8	COL9
A	B	COL1	0.716692	0.820369	0.319965	0.537561	0.009048	0.623179	0.329801	0.501619	0.461064
C	D	COL2	0.47814	0.568633	0.377889	0.322076	0.848357	0.585695	0.190736	0.666417	0.753265
		COL3	0.33489	0.464657	0.19093	0.712932	0.309871	0.298905	0.408627	0.931734	0.623899
E	F	COL4	0.182391	0.123417	0.184646	0.260479	0.347146	0.428899	0.070591	0.21143	0.833228
		COL5	0.048757	0.780064	0.238505	0.923092	0.527054	0.210349	0.51993	0.463315	0.781508
G	H	COL6	0.564873	0.769461	0.520583	0.604096	0.174694	0.395051	0.149006	0.923724	0.669723
		COL7	0.077545	0.799448	0.857925	0.023725	0.2919	0.145091	0.180948	0.425452	0.333822
I	J	COL8	0.40907	0.508158	0.975613	0.828987	0.009638	0.137749	0.039158	0.358044	0.816533
		COL9	0.280801	0.596491	0.265516	0.568593	0.066118	0.680901	0.915639	0.379575	0.277307

In this case, I want to delete row/col 3,5,7,9 (which are in the to_delete dataset at this point)

art297 · Posted 01-08-2013 06:03 PM

Will this do what you want?

data to_delete;

set large_matrix;

keep _COL_;

if missing (Female);

run;

proc sql noprint;

create table keep_small_matrix

as Select name

from dictionary.columns

where libname="WORK" and

memname="LARGE_MATRIX" and

name ne "_COL_"

;

DELETE from keep_small_matrix

WHERE name in (select * from work.to_delete)

;

select name

into : keeps

separated by ","

from keep_small_matrix

;

create table small_matrix as Select _COL_,&keeps.

from large_matrix

;

DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)

;

QUIT;

JimK · Posted 01-08-2013 06:45 PM

That does work Art.

Thank you

Rick_SAS · Posted 01-08-2013 09:49 AM

If you have SAS/IML, do it this way:

proc iml;

/* read original data */
use have; read all var _NUM_ into x[c=Names]; close have;
DeleteIdx = {1 4 6 8};

/* subset/extract */
KeepIdx = setdif( 1:ncol(x), DeleteIdx);
y = x[ , KeepIdx];
NewNames = Names[KeepIdx];

/* write new data */

create want from y[c=NewNames]; append from y; close want;

quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away