BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JimK
Calcite | Level 5

I create a matrix that is 45x45 fairly using my a data step.

However, I need to parse it down to 30x30 and the output from the procedure is nonsensical for me (it does not matter how the input data file is sorted).

Thus, I need to delete the same rows and columns from the matrix:

to_delete=(1,4,20,23,26,26,30,31,32,33,34,35,36,37,38)

So the final matrix would only contain row 2,3,5,6...etc of the original matrix and be square(nxn)

I have been stumped thus far.

I have been trying variations of this

data output;set input;

keep _COL_ =: to_delete;run;quit;

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Will this do what you want?

data to_delete;

  set large_matrix;

  keep _COL_;

  if missing (Female);

run;

proc sql noprint;

  create table keep_small_matrix

    as Select name

      from dictionary.columns

        where libname="WORK" and

              memname="LARGE_MATRIX" and

              name ne "_COL_"

  ;

  DELETE from keep_small_matrix

     WHERE name in (select * from work.to_delete)

  ;

   select name

     into : keeps

       separated by ","

         from keep_small_matrix

  ;

  create table small_matrix as Select _COL_,&keeps.

    from large_matrix

  ;

   DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)

  ;

QUIT;

View solution in original post

11 REPLIES 11
art297
Opal | Level 21

Not sure what you mean.  Can you provide a small example (as a datastep) that represents, say, a 5x5 matrix and a second datastep showing the result you want to achieve?

data_null__
Jade | Level 19

The "problem" is that rows can be easily referenced by number or other ID and can be DELETED with an executable statement.  The columns however have to be referenced by name and cannot be DROPPED with and executable statement.  One way to approach the problem would be transpose the data so that you can reference the rows and columns in the same way.

data mat;
   input row $ a b c d e f;
   cards;
a 1 2 3 4 5 6
b 1 2 3 4 5 6
c 1 2 3 4 5 6
d 1 2 3 4 5 6
e 1 2 3 4 5 6
f 1 2 3 4 5 6
;;;;
   run;
proc transpose name=col data=mat out=mat2 prefix=m;
   by row;
   run;

data remove;
   set mat2;
   if row in('a','c') then delete;
   if col in('a','c') then delete;
   run;
proc sort;
  
by row col;
   run;
proc transpose data=remove out=mat3;
   by row;
   id col;
   run;
JimK
Calcite | Level 5

data remove;
   set mat2;
   if row in('a','c') then delete;
   if col in('a','c') then delete;
   run;

I need to delete something like ~1000 rows and columns from a 10,000 x 10,000 matrix, do I need to put the columns into an array then somehow reference the array in the data step?

data_null__
Jade | Level 19

Put the row/column ID into a SAS data set and transpose that to an ARRAY(see below).  Then use the array in the IN operator.

You may find that you can generate the list with a DO loop if you use numbered row/column ID.

do delete=4 to 10, 32, 56, 100 to 200......

data mat;
   input row $ a b c d e f;
   cards;
a 1 2 3 4 5 6
b 1 2 3 4 5 6
c 1 2 3 4 5 6
d 1 2 3 4 5 6
e 1 2 3 4 5 6
f 1 2 3 4 5 6
;;;;
   run;
proc transpose name=col data=mat out=mat2 prefix=m;
   by row;
   run;
data delete;
   input del :$1. @@;
   cards;
a c
;;;;
   run;
proc transpose data=delete out=delete prefix=del;
   var del;
   run;
data remove;
   set mat2;
   if _n_ eq 1 then set delete;
   array del

  • del:;
  •    if row in del or col in del then delete;
       run;
    proc sort;
      
    by row col;
       run;
    proc transpose data=remove out=mat3;
       by row;
       id col;
       run;

    art297
    Opal | Level 21

    If DN correctly interpreted what you want to do, here is a way to do it in a data step:

    data have;

      input a b c d e f g h i j;

      cards;

    1 1 1 1 1 1 1 1 1 1

    2 2 2 2 2 2 2 2 2 2

    3 3 3 3 3 3 3 3 3 3

    4 4 4 4 4 4 4 4 4 4

    5 5 5 5 5 5 5 5 5 5

    6 6 6 6 6 6 6 6 6 6

    7 7 7 7 7 7 7 7 7 7

    8 8 8 8 8 8 8 8 8 8

    9 9 9 9 9 9 9 9 9 9

    0 0 0 0 0 0 0 0 0 0

    ;

    %let to_delete=(1,4,6,8);

    proc sql noprint;

      select name into :deletes

        separated by " "

          from dictionary.columns

            where libname="WORK" and

                  memname="HAVE" and

                  varnum in &to_delete

      ;

    quit;

    data want (drop=&deletes.);

      set have;

      if _n_ not in &to_delete.;

    run;

    JimK
    Calcite | Level 5

    I feel as though I can do this in PROC SQL but havent been successful today.

    I'm very close. I can delete the rows, but am stuck on deleting the columns

    Again, my "large matrix" has both rows and columns with the same variable names (COL1, COL2....COLi) so i merely create a dataset that has a list of these based off of some critera:

    data to_delete; set large_matrix;

    keep _COL_;

    if missing (Female);run;quit;

    to_delete contains random columns from which I want to delete both the row and column

    This successfully deletes the proper rows (not difficult). Deleting the proper columns is stumping though.

    proc sql;

    create table small_matrix as Select * from large_matrix;

    DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete);

    /* alter table work.kjk drop column  in (work.tod) Does not work to remove columns */

    ;

    QUIT;


    Thank you to all for your time.

    art297
    Opal | Level 21

    Can you show a SMALL example (say for 3 columns/rows) of what large_matrix looks like?

    JimK
    Calcite | Level 5
    FemaleMale_COL_COL1COL2COL3COL4COL5COL6COL7COL8COL9
    ABCOL10.7166920.8203690.3199650.5375610.0090480.6231790.3298010.5016190.461064
    CDCOL20.478140.5686330.3778890.3220760.8483570.5856950.1907360.6664170.753265
    COL30.334890.4646570.190930.7129320.3098710.2989050.4086270.9317340.623899
    EFCOL40.1823910.1234170.1846460.2604790.3471460.4288990.0705910.211430.833228
    COL50.0487570.7800640.2385050.9230920.5270540.2103490.519930.4633150.781508
    GHCOL60.5648730.7694610.5205830.6040960.1746940.3950510.1490060.9237240.669723
    COL70.0775450.7994480.8579250.0237250.29190.1450910.1809480.4254520.333822
    IJCOL80.409070.5081580.9756130.8289870.0096380.1377490.0391580.3580440.816533
    COL90.2808010.5964910.2655160.5685930.0661180.6809010.9156390.3795750.277307

    In this case, I want to delete row/col 3,5,7,9 (which are in the to_delete dataset at this point)

    art297
    Opal | Level 21

    Will this do what you want?

    data to_delete;

      set large_matrix;

      keep _COL_;

      if missing (Female);

    run;

    proc sql noprint;

      create table keep_small_matrix

        as Select name

          from dictionary.columns

            where libname="WORK" and

                  memname="LARGE_MATRIX" and

                  name ne "_COL_"

      ;

      DELETE from keep_small_matrix

         WHERE name in (select * from work.to_delete)

      ;

       select name

         into : keeps

           separated by ","

             from keep_small_matrix

      ;

      create table small_matrix as Select _COL_,&keeps.

        from large_matrix

      ;

       DELETE from small_matrix WHERE _COL_ in (select * from work.to_delete)

      ;

    QUIT;

    JimK
    Calcite | Level 5

    That does work Art.

    Thank you

    Rick_SAS
    SAS Super FREQ

    If you have SAS/IML, do it this way:

    proc iml;

    /* read original data */
    use have; read all var _NUM_ into x[c=Names]; close have;
    DeleteIdx = {1 4 6 8};

    /* subset/extract */
    KeepIdx = setdif( 1:ncol(x), DeleteIdx);
    y = x[ , KeepIdx];
    NewNames = Names[KeepIdx];

    /* write new data */

    create want from y[c=NewNames]; append from y; close want;

    quit;

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 11 replies
    • 2005 views
    • 0 likes
    • 4 in conversation