I create a matrix that is 45x45 fairly using my a data step.
However, I need to parse it down to 30x30 and the output from the procedure is nonsensical for me (it does not matter how the input data file is sorted).
Thus, I need to delete the same rows and columns from the matrix:
to_delete=(1,4,20,23,26,26,30,31,32,33,34,35,36,37,38)
So the final matrix would only contain row 2,3,5,6...etc of the original matrix and be square(nxn)
I have been stumped thus far.
I have been trying variations of this
data output;set input;
keep _COL_ =: to_delete;run;quit;
Will this do what you want?
data to_delete;
set large_matrix;
keep _COL_;
if missing (Female);
run;
proc sql noprint;
create table keep_small_matrix
as Select name
from dictionary.columns
where libname="WORK" and
memname="LARGE_MATRIX" and
name ne "_COL_"
;
DELETE from keep_small_matrix
WHERE name in (select * from work.to_delete)
;
select name
into : keeps
separated by ","
from keep_small_matrix
;
create table small_matrix as Select _COL_,&keeps.
from large_matrix
;
DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)
;
QUIT;
Not sure what you mean. Can you provide a small example (as a datastep) that represents, say, a 5x5 matrix and a second datastep showing the result you want to achieve?
The "problem" is that rows can be easily referenced by number or other ID and can be DELETED with an executable statement. The columns however have to be referenced by name and cannot be DROPPED with and executable statement. One way to approach the problem would be transpose the data so that you can reference the rows and columns in the same way.
data remove;
set mat2;
if row in('a','c') then delete;
if col in('a','c') then delete;
run;
I need to delete something like ~1000 rows and columns from a 10,000 x 10,000 matrix, do I need to put the columns into an array then somehow reference the array in the data step?
Put the row/column ID into a SAS data set and transpose that to an ARRAY(see below). Then use the array in the IN operator.
You may find that you can generate the list with a DO loop if you use numbered row/column ID.
do delete=4 to 10, 32, 56, 100 to 200......
data mat;
input row $ a b c d e f;
cards;
a 1 2 3 4 5 6
b 1 2 3 4 5 6
c 1 2 3 4 5 6
d 1 2 3 4 5 6
e 1 2 3 4 5 6
f 1 2 3 4 5 6
;;;;
run;
proc transpose name=col data=mat out=mat2 prefix=m;
by row;
run;
data delete;
input del :$1. @@;
cards;
a c
;;;;
run;
proc transpose data=delete out=delete prefix=del;
var del;
run;
data remove;
set mat2;
if _n_ eq 1 then set delete;
array del
run;
proc sort;
by row col;
run;
proc transpose data=remove out=mat3;
by row;
id col;
run;
If DN correctly interpreted what you want to do, here is a way to do it in a data step:
data have;
input a b c d e f g h i j;
cards;
1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
0 0 0 0 0 0 0 0 0 0
;
%let to_delete=(1,4,6,8);
proc sql noprint;
select name into :deletes
separated by " "
from dictionary.columns
where libname="WORK" and
memname="HAVE" and
varnum in &to_delete
;
quit;
data want (drop=&deletes.);
set have;
if _n_ not in &to_delete.;
run;
I feel as though I can do this in PROC SQL but havent been successful today.
I'm very close. I can delete the rows, but am stuck on deleting the columns
Again, my "large matrix" has both rows and columns with the same variable names (COL1, COL2....COLi) so i merely create a dataset that has a list of these based off of some critera:
data to_delete; set large_matrix;
keep _COL_;
if missing (Female);run;quit;
to_delete contains random columns from which I want to delete both the row and column
This successfully deletes the proper rows (not difficult). Deleting the proper columns is stumping though.
proc sql;
create table small_matrix as Select * from large_matrix;
DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete);
/* alter table work.kjk drop column in (work.tod) Does not work to remove columns */
;
QUIT;
Thank you to all for your time.
Can you show a SMALL example (say for 3 columns/rows) of what large_matrix looks like?
Female | Male | _COL_ | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 | COL8 | COL9 |
A | B | COL1 | 0.716692 | 0.820369 | 0.319965 | 0.537561 | 0.009048 | 0.623179 | 0.329801 | 0.501619 | 0.461064 |
C | D | COL2 | 0.47814 | 0.568633 | 0.377889 | 0.322076 | 0.848357 | 0.585695 | 0.190736 | 0.666417 | 0.753265 |
COL3 | 0.33489 | 0.464657 | 0.19093 | 0.712932 | 0.309871 | 0.298905 | 0.408627 | 0.931734 | 0.623899 | ||
E | F | COL4 | 0.182391 | 0.123417 | 0.184646 | 0.260479 | 0.347146 | 0.428899 | 0.070591 | 0.21143 | 0.833228 |
COL5 | 0.048757 | 0.780064 | 0.238505 | 0.923092 | 0.527054 | 0.210349 | 0.51993 | 0.463315 | 0.781508 | ||
G | H | COL6 | 0.564873 | 0.769461 | 0.520583 | 0.604096 | 0.174694 | 0.395051 | 0.149006 | 0.923724 | 0.669723 |
COL7 | 0.077545 | 0.799448 | 0.857925 | 0.023725 | 0.2919 | 0.145091 | 0.180948 | 0.425452 | 0.333822 | ||
I | J | COL8 | 0.40907 | 0.508158 | 0.975613 | 0.828987 | 0.009638 | 0.137749 | 0.039158 | 0.358044 | 0.816533 |
COL9 | 0.280801 | 0.596491 | 0.265516 | 0.568593 | 0.066118 | 0.680901 | 0.915639 | 0.379575 | 0.277307 |
In this case, I want to delete row/col 3,5,7,9 (which are in the to_delete dataset at this point)
Will this do what you want?
data to_delete;
set large_matrix;
keep _COL_;
if missing (Female);
run;
proc sql noprint;
create table keep_small_matrix
as Select name
from dictionary.columns
where libname="WORK" and
memname="LARGE_MATRIX" and
name ne "_COL_"
;
DELETE from keep_small_matrix
WHERE name in (select * from work.to_delete)
;
select name
into : keeps
separated by ","
from keep_small_matrix
;
create table small_matrix as Select _COL_,&keeps.
from large_matrix
;
DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)
;
QUIT;
That does work Art.
Thank you
If you have SAS/IML, do it this way:
proc iml;
/* read original data */
use have; read all var _NUM_ into x[c=Names]; close have;
DeleteIdx = {1 4 6 8};
/* subset/extract */
KeepIdx = setdif( 1:ncol(x), DeleteIdx);
y = x[ , KeepIdx];
NewNames = Names[KeepIdx];
/* write new data */
create want from y[c=NewNames]; append from y; close want;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.