DATA Step, Macro, Functions and more

Subset a Matrix

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Subset a Matrix

I create a matrix that is 45x45 fairly using my a data step.

However, I need to parse it down to 30x30 and the output from the procedure is nonsensical for me (it does not matter how the input data file is sorted).

Thus, I need to delete the same rows and columns from the matrix:

to_delete=(1,4,20,23,26,26,30,31,32,33,34,35,36,37,38)

So the final matrix would only contain row 2,3,5,6...etc of the original matrix and be square(nxn)

I have been stumped thus far.

I have been trying variations of this

data output;set input;

keep _COL_ =: to_delete;run;quit;


Accepted Solutions
Solution
‎01-08-2013 06:03 PM
PROC Star
Posts: 7,363

Re: Subset a Matrix

Will this do what you want?

data to_delete;

  set large_matrix;

  keep _COL_;

  if missing (Female);

run;

proc sql noprint;

  create table keep_small_matrix

    as Select name

      from dictionary.columns

        where libname="WORK" and

              memname="LARGE_MATRIX" and

              name ne "_COL_"

  ;

  DELETE from keep_small_matrix

     WHERE name in (select * from work.to_delete)

  ;

   select name

     into : keeps

       separated by ","

         from keep_small_matrix

  ;

  create table small_matrix as Select _COL_,&keeps.

    from large_matrix

  ;

   DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)

  ;

QUIT;

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Subset a Matrix

Not sure what you mean.  Can you provide a small example (as a datastep) that represents, say, a 5x5 matrix and a second datastep showing the result you want to achieve?

Respected Advisor
Posts: 3,777

Re: Subset a Matrix

The "problem" is that rows can be easily referenced by number or other ID and can be DELETED with an executable statement.  The columns however have to be referenced by name and cannot be DROPPED with and executable statement.  One way to approach the problem would be transpose the data so that you can reference the rows and columns in the same way.

data mat;
   input row $ a b c d e f;
   cards;
a 1 2 3 4 5 6
b 1 2 3 4 5 6
c 1 2 3 4 5 6
d 1 2 3 4 5 6
e 1 2 3 4 5 6
f 1 2 3 4 5 6
;;;;
   run;
proc transpose name=col data=mat out=mat2 prefix=m;
   by row;
   run;

data remove;
   set mat2;
   if row in('a','c') then delete;
   if col in('a','c') then delete;
   run;
proc sort;
  
by row col;
   run;
proc transpose data=remove out=mat3;
   by row;
   id col;
   run;
Occasional Contributor
Posts: 7

Re: Subset a Matrix

data remove;
   set mat2;
   if row in('a','c') then delete;
   if col in('a','c') then delete;
   run;

I need to delete something like ~1000 rows and columns from a 10,000 x 10,000 matrix, do I need to put the columns into an array then somehow reference the array in the data step?

Respected Advisor
Posts: 3,777

Re: Subset a Matrix

Put the row/column ID into a SAS data set and transpose that to an ARRAY(see below).  Then use the array in the IN operator.

You may find that you can generate the list with a DO loop if you use numbered row/column ID.

do delete=4 to 10, 32, 56, 100 to 200......

data mat;
   input row $ a b c d e f;
   cards;
a 1 2 3 4 5 6
b 1 2 3 4 5 6
c 1 2 3 4 5 6
d 1 2 3 4 5 6
e 1 2 3 4 5 6
f 1 2 3 4 5 6
;;;;
   run;
proc transpose name=col data=mat out=mat2 prefix=m;
   by row;
   run;
data delete;
   input del :$1. @@;
   cards;
a c
;;;;
   run;
proc transpose data=delete out=delete prefix=del;
   var del;
   run;
data remove;
   set mat2;
   if _n_ eq 1 then set delete;
   array del

  • del:;
  •    if row in del or col in del then delete;
       run;
    proc sort;
      
    by row col;
       run;
    proc transpose data=remove out=mat3;
       by row;
       id col;
       run;

    PROC Star
    Posts: 7,363

    Re: Subset a Matrix

    If DN correctly interpreted what you want to do, here is a way to do it in a data step:

    data have;

      input a b c d e f g h i j;

      cards;

    1 1 1 1 1 1 1 1 1 1

    2 2 2 2 2 2 2 2 2 2

    3 3 3 3 3 3 3 3 3 3

    4 4 4 4 4 4 4 4 4 4

    5 5 5 5 5 5 5 5 5 5

    6 6 6 6 6 6 6 6 6 6

    7 7 7 7 7 7 7 7 7 7

    8 8 8 8 8 8 8 8 8 8

    9 9 9 9 9 9 9 9 9 9

    0 0 0 0 0 0 0 0 0 0

    ;

    %let to_delete=(1,4,6,8);

    proc sql noprint;

      select name into :deletes

        separated by " "

          from dictionary.columns

            where libname="WORK" and

                  memname="HAVE" and

                  varnum in &to_delete

      ;

    quit;

    data want (drop=&deletes.);

      set have;

      if _n_ not in &to_delete.;

    run;

    Occasional Contributor
    Posts: 7

    Re: Subset a Matrix

    I feel as though I can do this in PROC SQL but havent been successful today.

    I'm very close. I can delete the rows, but am stuck on deleting the columns

    Again, my "large matrix" has both rows and columns with the same variable names (COL1, COL2....COLi) so i merely create a dataset that has a list of these based off of some critera:

    data to_delete; set large_matrix;

    keep _COL_;

    if missing (Female);run;quit;

    to_delete contains random columns from which I want to delete both the row and column

    This successfully deletes the proper rows (not difficult). Deleting the proper columns is stumping though.

    proc sql;

    create table small_matrix as Select * from large_matrix;

    DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete);

    /* alter table work.kjk drop column  in (work.tod) Does not work to remove columns */

    ;

    QUIT;


    Thank you to all for your time.

    PROC Star
    Posts: 7,363

    Re: Subset a Matrix

    Can you show a SMALL example (say for 3 columns/rows) of what large_matrix looks like?

    Occasional Contributor
    Posts: 7

    Re: Subset a Matrix

    FemaleMale_COL_COL1COL2COL3COL4COL5COL6COL7COL8COL9
    ABCOL10.7166920.8203690.3199650.5375610.0090480.6231790.3298010.5016190.461064
    CDCOL20.478140.5686330.3778890.3220760.8483570.5856950.1907360.6664170.753265
    COL30.334890.4646570.190930.7129320.3098710.2989050.4086270.9317340.623899
    EFCOL40.1823910.1234170.1846460.2604790.3471460.4288990.0705910.211430.833228
    COL50.0487570.7800640.2385050.9230920.5270540.2103490.519930.4633150.781508
    GHCOL60.5648730.7694610.5205830.6040960.1746940.3950510.1490060.9237240.669723
    COL70.0775450.7994480.8579250.0237250.29190.1450910.1809480.4254520.333822
    IJCOL80.409070.5081580.9756130.8289870.0096380.1377490.0391580.3580440.816533
    COL90.2808010.5964910.2655160.5685930.0661180.6809010.9156390.3795750.277307

    In this case, I want to delete row/col 3,5,7,9 (which are in the to_delete dataset at this point)

    Solution
    ‎01-08-2013 06:03 PM
    PROC Star
    Posts: 7,363

    Re: Subset a Matrix

    Will this do what you want?

    data to_delete;

      set large_matrix;

      keep _COL_;

      if missing (Female);

    run;

    proc sql noprint;

      create table keep_small_matrix

        as Select name

          from dictionary.columns

            where libname="WORK" and

                  memname="LARGE_MATRIX" and

                  name ne "_COL_"

      ;

      DELETE from keep_small_matrix

         WHERE name in (select * from work.to_delete)

      ;

       select name

         into : keeps

           separated by ","

             from keep_small_matrix

      ;

      create table small_matrix as Select _COL_,&keeps.

        from large_matrix

      ;

       DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)

      ;

    QUIT;

    Occasional Contributor
    Posts: 7

    Re: Subset a Matrix

    That does work Art.

    Thank you

    SAS Super FREQ
    Posts: 3,482

    Re: Subset a Matrix

    If you have SAS/IML, do it this way:

    proc iml;

    /* read original data */
    use have; read all var _NUM_ into x[c=Names]; close have;
    DeleteIdx = {1 4 6 8};

    /* subset/extract */
    KeepIdx = setdif( 1:ncol(x), DeleteIdx);
    y = x[ , KeepIdx];
    NewNames = Names[KeepIdx];

    /* write new data */

    create want from y[c=NewNames]; append from y; close want;

    quit;

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 11 replies
    • 526 views
    • 0 likes
    • 4 in conversation